Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [egit-dev] Re: jgit problems for file paths with non-ASCII characters

> But we probably don't want the encoding to be a
> single encoding constant in this JVM, we probably need to support
> a per-repository configuration of the encoding for path names so
> that we can eventually move to a non-platform specific encoding.

OK, we will take care of that and try to come up with some patches
during the next week -- just FYI.

--
Best regards,
Marc Strapetz
=============
syntevo GmbH
http://www.syntevo.com
http://blog.syntevo.com



Shawn O. Pearce wrote:
> Marc Strapetz <marc.strapetz@xxxxxxxxxxx> wrote:
>>> We should try to work harder with the git-core folks to get character
>>> set encoding for file names worked out.  We might be able to use a
>>> configuration setting in the repository to tell us what the proper
>>> encoding should be, and if not set, assume UTF-8.
>> I agree that this should be the ultimate goal, though the default should
>> better be "system encoding" for compatibility with current git
>> repositories and instead have newer git versions always set encoding to
>> UTF-8. Thus, for our jgit clone I've introduced a system property to
>> configure Constants.PATH_ENCODING set to system encoding. It's used by
>> PathFilter and this resolves my original problem.
> 
> That's probably a good point, using the system encoding on a
> repository may produce the file names in a more compatible way
> with git-core.  But we probably don't want the encoding to be a
> single encoding constant in this JVM, we probably need to support
> a per-repository configuration of the encoding for path names so
> that we can eventually move to a non-platform specific encoding.
> 
>> I have tried to switch more usages from Constants.CHARACTER_ENCODING to
>> Constants.PATH_ENCODING, but ended up in confusion due to my lack of
>> understanding: primarily because I couldn't tell anymore whether encoded
>> strings were file names or not.
> 
> Heh.  Yea.  There are a number of file name encoding sites.  I think
> everything in the treewalk package, as well as the GitIndex, Tree and
> DirCache* classes.  Also the Patch class and its FileHeader friend.
> 
>> Does it make sense to explicitly
>> distinguish encoding usages in that way? We could try to contribute here
>> (and hopefully cause less review effort to jgit developers than the
>> changes itself are worth ;-)
> 
> Yes, it does.  Because we eventually need to support encodings
> other than the current UTF-8 we assume for file names, especially
> if a repository is using the local filesystem encoding and that
> isn't UTF-8.
> 


Back to the top