Bug 581153 - Windows S3 puts backslash in pack paths
Summary: Windows S3 puts backslash in pack paths
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 6.3   Edit
Hardware: PC Windows All
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-11-30 10:50 EST by Jeremy Braun CLA
Modified: 2022-12-05 22:16 EST (History)
2 users (show)

See Also:


Attachments
screenshot of AWS console illustrating bad paths (38.84 KB, image/png)
2022-11-30 10:50 EST, Jeremy Braun CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jeremy Braun CLA 2022-11-30 10:50:31 EST
Created attachment 288863 [details]
screenshot of AWS console illustrating bad paths

I haven't tried this w/ the latest release of jgit yet, it's possible it's been addressed.

On Windows, the directory separator character is backslash. That is leaking into the bucket URLs for *.pack and *.idx files. S3 treats backslash as a normal character, so instead of these files going into the objects/pack/*.(pack|idx) paths, they end up in the objects folder with filenames that begin with `pack\`. Subsequent attempts to fetch/clone from the repo then can't find the pack files and the objects they contain, and the fetch/clone fails.

Generally I'm able to reproduce this by:
* Have a repository stored in S3
* Make some local commits.
* Run `git gc` locally to create new *.packs (this may not be required)
* Run `jgit push`
* Examine the S3 storage, note new pack files with the broken paths
* Try to fetch/clone from a separate enlistment, you'll get failed object fetches because the pack files weren't found/indexed.
Comment 1 Jeremy Braun CLA 2022-11-30 10:55:33 EST
I just tried again with `jgit version 6.3.0.202209071007-r`, and confirmed it's still reproducing the problem.

And I confirmed the steps that produce it are:
* Clone an S3 repository
* Make a new local commit
* `git gc`
* `jgit push`
* Examine the S3 bucket's `object` folder, and note pack/idx files with `pack\` prefixes
Comment 2 Thomas Wolf CLA 2022-11-30 18:31:37 EST
Yes. TransportAmazonS3.DatabaseS3 assumes /-delimited paths, but is called in WalkPushConnection on several occasions with some java.io.File.getPath(), which uses \ if on Windows.
Comment 3 Jeremy Braun CLA 2022-12-03 11:38:15 EST
Would you recommend the paths get sanitized at `TransportAmazonS3.DatabaseS3` or at `WalkPushConnection`? Would a `path = path.replace('\\', '/')` in DatabaseS3 suffice?
Comment 4 Thomas Wolf CLA 2022-12-04 16:16:14 EST
(In reply to Jeremy Braun from comment #3)
> Would you recommend the paths get sanitized at
> `TransportAmazonS3.DatabaseS3` or at `WalkPushConnection`? Would a `path =
> path.replace('\\', '/')` in DatabaseS3 suffice?

Good question. DatabaseS3 is a WalkRemoteObjectDatabase, which uses String, not Path or some such in the its interface. It assumes /. This could be made a bit more explicit in the javadoc. There are two more implementations, one for Http and one for Sftp, which also work with /.

So I would think this should be fixed in WalkPushConnection. WalkPushConnection already uses / in other places.

I would suggest doing the translation only if necessary:

  if (File.separatorChar != '/') {
    path = path.replace(File.separatorChar, '/');
  }

This is what File.toURI() does, too.
Comment 5 Jeremy Braun CLA 2022-12-05 21:57:50 EST
I think I've got a preliminary path.

I ran into another problem while looking at this, I wasn't getting any pack files when trying to list the pack directory, because the `name` parameter to the listParser in AmazonS3.java was zero-length, and the element tag was in the qName parameter. I've "fixed" this too, but don't really know why that was occurring, and whether the fix (look at the first non-zero length argument) is really the right thing.

I'm off to figure out how to post a pull request or patch for review.