Bug 510329 - slow fetch
Summary: slow fetch
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 4.7   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-11 18:22 EST by David Turner CLA
Modified: 2017-01-12 14:58 EST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Turner CLA 2017-01-11 18:22:11 EST
I'm noticing a jgit fetch taking several minutes to complete.  Regular git fetch of the same refspec takes under a second.  It's 100% reproducible.  The really weird thing is that the conversation with the server (at least according to GIT_PACKET_TRACE and log4j.logger.org.eclipse.jgit.transport) is 100% identical.  After that, jgit downloads a giant pack of hundreds of thousands of objects.

I'm fetching over http.

The ref that's being fetched is refs/notes/something.  We are fetching it not on top of our old version, but to a temp location (so, +refs/notes/something:refs/notes-incoming/something).  That's because we might have new local notes.  We "merge" notes by using a NoteMapMerger and then *writing the newly-generated tree to a new unparented commit*.  We then force-push this (using regular git, since we use force-lease).  This allow multiple machines to add notes without having to deal with merges and without generating many extra commit objects.  Anyway, so the commit we're fetching is often not the same as any local commit, nor does it have any parents in common (since it has no parents at all).  But regular git handles this just fine.

Unfortunately, I'm not able to get a good repro of this outside of our proprietary internal repo.  But I'm happy to run tests or try patches if needed for troubleshooting.
Comment 1 Christian Halstrick CLA 2017-01-12 03:16:15 EST
You are saying that JGit is downloading a giant pack. Is native git also downloading that big pack or is it downloading a smaller pack? What should be included in pack is determined on the server side based on that the client is telling what he already has. If the conversation with the server is exactly the same when comparing jgit and native git then the server should determine the same pack file to be sent to the client. 

Maybe you could attach the traces.
Comment 2 David Turner CLA 2017-01-12 14:58:43 EST
Here's the redacted version of the trace from jgit:

2017-01-11 22:32:39 DEBUG PacketLineIn:165 - git< # service=git-upload-pack
2017-01-11 22:32:39 DEBUG PacketLineIn:144 - git< 0000
2017-01-11 22:32:39 DEBUG PacketLineIn:165 - git< [sha redacted] HEADmulti_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/master agent=git/[version redacted]
2017-01-11 22:32:39 DEBUG PacketLineIn:165 - git< [sha redacted] refs/heads/[refname redacted]
[about 400 of the same]
2017-01-11 22:32:39 DEBUG PacketLineIn:144 - git< 0000

and from git 2.10.something (we have cherry-picked some unrelated patches from upstream; mainly those I wrote myself):
packet:          git< # service=git-upload-pack
packet:          git< 0000
packet:          git< [sha redacted] HEAD\0multi_ack thin-pack side-band side-band-64k ofs-delta shallow no-progress include-tag multi_ack_detailed allow-tip-sha1-in-want allow-reachable-sha1-in-want no-done symref=HEAD:refs/heads/master agent=git/[version redacted]
packet:          git< [sha redacted] refs/heads/[refname redacted]
[same 400 as the above]
packet:          git< 0000

I used sed to strip off the initial (up to "git<") portion of the two traces, and then diff; the two were the same.

Native git is not downloading the giant pack.

I agree that this is really weird and that the server should be sending the same thing in both cases.  I would be happy to turn on more logging, but I'm not sure what to turn on.  Also I can't share raw logs because my company is quite sensitive about that sort of thing; I can do redacted logs but they will need to be in a format that can be understood by humans to ensure that they are correctly redacted.

BTW, my jgit test is using org.eclipse.jgit.pgm/target/jgit fetch so it's not something weird in my code -- it's just raw jgit stuff.