[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [egit-dev] Re: jgit problems for file paths with non-ASCII characters
|
Marc Strapetz <marc.strapetz@xxxxxxxxxxx> wrote:
> > We should try to work harder with the git-core folks to get character
> > set encoding for file names worked out. We might be able to use a
> > configuration setting in the repository to tell us what the proper
> > encoding should be, and if not set, assume UTF-8.
>
> I agree that this should be the ultimate goal, though the default should
> better be "system encoding" for compatibility with current git
> repositories and instead have newer git versions always set encoding to
> UTF-8. Thus, for our jgit clone I've introduced a system property to
> configure Constants.PATH_ENCODING set to system encoding. It's used by
> PathFilter and this resolves my original problem.
That's probably a good point, using the system encoding on a
repository may produce the file names in a more compatible way
with git-core. But we probably don't want the encoding to be a
single encoding constant in this JVM, we probably need to support
a per-repository configuration of the encoding for path names so
that we can eventually move to a non-platform specific encoding.
> I have tried to switch more usages from Constants.CHARACTER_ENCODING to
> Constants.PATH_ENCODING, but ended up in confusion due to my lack of
> understanding: primarily because I couldn't tell anymore whether encoded
> strings were file names or not.
Heh. Yea. There are a number of file name encoding sites. I think
everything in the treewalk package, as well as the GitIndex, Tree and
DirCache* classes. Also the Patch class and its FileHeader friend.
> Does it make sense to explicitly
> distinguish encoding usages in that way? We could try to contribute here
> (and hopefully cause less review effort to jgit developers than the
> changes itself are worth ;-)
Yes, it does. Because we eventually need to support encodings
other than the current UTF-8 we assume for file names, especially
if a repository is using the local filesystem encoding and that
isn't UTF-8.
--
Shawn.