Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [egit-dev] Re: jgit problems for file paths with non-ASCII characters

Robin Rosenberg <robin.rosenberg@xxxxxxxxxx> wrote:
> onsdag 25 november 2009 14:47:25 skrev  Marc Strapetz:
> > I have noticed that jgit converts file paths to UTF-8 when querying the
> > repository.
...
> > Is this a bug or a misconfiguration of my repository? I'm using jgit
> > (commit e16af839e8a0cc01c52d3648d2d28e4cb915f80f) on Windows.
> 
> A bug. 
> 
> The problem here is that we need to allow multiple encodings since there
> is no reliable encoding specified anywhere.

This is a design fault of both Linux and git.  git gets a byte
sequence from readdir and stores that as-is into the repository.
We have no way of knowing what that encoding is.  So now everyone
touching a Git repository is screwed.

> The approach I advocate is
> the one we use for handling encoding in general. I.e. if it looks like UTF-8,
> treat it like that else fallback. This is expensive however

We should try to work harder with the git-core folks to get character
set encoding for file names worked out.  We might be able to use a
configuration setting in the repository to tell us what the proper
encoding should be, and if not set, assume UTF-8.

> and then we have
> all the other issues with case insensitive name and the funny property that
> unicode has when it allows characters to be encoding using multiple sequences
> of code points as empoloyed by Apple.

But as you said, this still doesn't make the Apple normal form
any easier.  Though if we know we are on such a strange filesystem
we might be able to assume the paths in the repository are equally
damaged.  Or not.

-- 
Shawn.


Back to the top