Bug 344143 - git clone performance much worse in JGit than from msysgit
Summary: git clone performance much worse in JGit than from msysgit
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P3 normal with 1 vote (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL: http://dev.eclipse.org/mhonarc/lists/...
Whiteboard:
Keywords: investigate, performance
Depends on:
Blocks:
 
Reported: 2011-04-28 11:21 EDT by Susan McCourt CLA
Modified: 2013-06-19 02:05 EDT (History)
8 users (show)

See Also:


Attachments
performance test (4.44 KB, text/plain)
2011-06-15 10:03 EDT, Tomasz Zarna CLA
no flags Details
The results of running the test over night (6.20 KB, text/plain)
2011-06-16 04:51 EDT, Tomasz Zarna CLA
no flags Details
updated performance test (5.57 KB, patch)
2011-06-24 06:01 EDT, Tomasz Zarna CLA
no flags Details | Diff
mylyn/context/zip (9.51 KB, application/octet-stream)
2011-06-24 06:01 EDT, Tomasz Zarna CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Susan McCourt CLA 2011-04-28 11:21:58 EDT
John asked me to compare command line clone with Orion clone in bug 344011.

I'm running msysgit
git-gui version 0.13.GITGUI
git version 1.7.3.1.msysgit.0
mingw32 version 1.7.3.1-preview20101002

Today I cloned the orion client repo (http://git.eclipse.org/c/e4/org.eclipse.orion.client.git) from msysgit and then from Orion.  I'm on a pretty slow DSL connection so I'm used to things taking longer for me, but given the size of the repo, the clone speed seemed especially slow.

msysgit: clone took just under 6 min, then about 30 sec of loose object compression. 
orion: clone took 30 min.

I tried the msysgit clone twice, before and after the Orion test, same result.

(I suspect eGit would take as long as Orion, but perhaps that test could be run by someone with a faster connection....)
Comment 1 Tomasz Zarna CLA 2011-05-04 08:05:55 EDT
The Orion instance was on localhost, right?
Comment 2 Susan McCourt CLA 2011-05-04 11:57:52 EDT
(In reply to comment #1)
> The Orion instance was on localhost, right?

yes, this was localhost server cloning the eclipse.org repo.
Comment 3 Tomasz Zarna CLA 2011-05-20 05:50:30 EDT
(In reply to comment #0)
> (I suspect eGit would take as long as Orion, but perhaps that test could be run by someone with a faster connection....)

Works fine to me. I've just tried cloning git://github.com/eclipse/orion.client.git in Orion and it took couple of secs:

POST http://localhost:8080/git/clone/ 202 Accepted	88ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	14ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	14ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	19ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	20ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	22ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	22ms	
GET http://localhost:8080/task/id/0FsTD8WCABAVApbcgSSeXQ 200 OK	13ms	
GET http://localhost:8080/git/clone//workspace/P 200 OK	13ms

Output for cloning over plain HTTP was pretty much the same. It took a little bit longer but that's expected since it's known to be a bit inefficient.
Comment 4 Tomasz Zarna CLA 2011-06-07 05:45:13 EDT
I think I saw this yesterday trying to clone a orion.server over HTTP. It took way too long then I expected. During RC2, I will prepare a performance test to check if it's a real issue.
Comment 5 Susan McCourt CLA 2011-06-07 11:12:54 EDT
I cloned the orion client repository on orion.eclipse.org (taking my slow connection out of the equation) and noticed it took quite some time, but I was doing other work so unfortunately I can't quantify it.  I've seen it happen very fast at other times.
Comment 6 Tomasz Zarna CLA 2011-06-14 06:59:46 EDT
Piotrek, please take a look at this and let us know what do you think about it.
Comment 7 Tomasz Zarna CLA 2011-06-15 10:03:49 EDT
Created attachment 198024 [details]
performance test

A test case I will be using to compare JGit with a command line tool (cgit).
Comment 8 Tomasz Zarna CLA 2011-06-15 11:36:24 EDT
So far, for 10 iterations, all cases give similar results. It's more less 30 secs of wall-clock time. 

There's a noticeable difference in CPU/kernel times but I assume it's related to the fact I'm spawning a new process to run "git clone" in the command line. I don't consider it as an important factor.
Comment 9 Tomasz Zarna CLA 2011-06-16 04:51:44 EDT
Created attachment 198073 [details]
The results of running the test over night

Summary, elapsed process [s]:

JGit over git protocol : 48.86 (over 2x longer!)
CGit over git protocol : 21.33
JGit over http protocol: 31.35
CGit over http protocol: 20.52

I planned to add additional tests that would measure how long does the cloning take using our REST API, but I think the above is enough to prove that JGit is the culprit.
Comment 10 Tomasz Zarna CLA 2011-06-24 06:01:25 EDT
Created attachment 198511 [details]
updated performance test
Comment 11 Tomasz Zarna CLA 2011-06-24 06:01:27 EDT
Created attachment 198512 [details]
mylyn/context/zip
Comment 12 Tomasz Zarna CLA 2011-10-03 05:08:47 EDT
So far there has been no response on jgit-dev. I'm afraid there is little we can do about it, at least for 0.3. Postponing until 0.4.
Comment 13 Tomasz Zarna CLA 2011-11-28 06:33:45 EST
This is actually a performance issue in JGit so moving the bug to their inbox.
Comment 14 Markus Duft CLA 2012-12-18 07:53:14 EST
same problem here with a rather big repo:

Native git 1.7.10        :   6.477u  1.770s 0:08.37 98.4%   0+0k 0+0io 18pf+0w
JGit 2.1.0.201209190230-r: 283.093u 18.666s 4:22.60 114.9%  0+0k 0+0io 1pf+0w
Comment 15 Chris Aniszczyk CLA 2012-12-18 10:08:23 EST
Shawn, any thoughts?
Comment 16 Shawn Pearce CLA 2012-12-18 12:23:42 EST
How was JGit's WindowCacheConfig setup in the JVM? This has a big impact on performance for JGit unfortunately, and EGit had to expose UI to allow users to configure it when working on bigger repositories.

In particular parameters like packedGitLimit, deltaBaseCacheLimit, and packedGitOpenFiles restrict how much memory JGit can use.

When memory is "low" according to these limits, JGit pages to disk rather than using more memory. CGit has no fewer (or no) such memory limits as it can allocate as much memory as it wants from the OS. JGit doesn't have this luxury when embedded inside of a JVM that has a maximum heap size.

JGit by default also uses SoftReferences in a lot of the caches, so when JVM memory is tight the GC will clear these on us and we have to rebuild the data we lost. In big servers we actually use hard references instead, but this requires a lot of RAM in the JVM (e.g. heap 2x the size of packedGitLimit). EGit doesn't have this sort of luxury when the rest of the platform is already occupying most of the heap. Unfortunately the decision to use soft vs. hard references is a hard-coded feature inside of JGit, so we can't easily toggle this.

I'll try looking at this org.eclipse.orion.client.git repository after I get into the office later today. I suspect it is "just" a misconfigured JGit, small repositories tend to fit into the defaults, big ones don't. :-(
Comment 17 John Arthorne CLA 2012-12-18 13:55:10 EST
The performance test that was used is attached. It is a simple standalone java program that illustrates the speed difference from CGit. Maybe there is something that can be configured differently there.

The Orion case I can believe needing tuning. It is a single JVM instance serving hundreds of users with thousands of clones, typically running for a month or more between shutdowns. Currently I don't think we do any kind of JGit configuration at all. Are these properties that go into the git config, JVM properties, or something else? This is the Orion git repository where our code is:

http://git.eclipse.org/gitroot/orion/org.eclipse.orion.server.git

See the project org.eclipse.orion.server.git. I'm not sure what you are looking for, but perhaps InitJob and CloneJob are starting points to see where we invoke the JGit API to init and clone repositories.
Comment 18 Matthias Sohn CLA 2012-12-21 04:46:13 EST
(In reply to comment #17)
> The performance test that was used is attached. It is a simple standalone
> java program that illustrates the speed difference from CGit. Maybe there is
> something that can be configured differently there.
> 
> The Orion case I can believe needing tuning. It is a single JVM instance
> serving hundreds of users with thousands of clones, typically running for a
> month or more between shutdowns. Currently I don't think we do any kind of
> JGit configuration at all. Are these properties that go into the git config,
> JVM properties, or something else? This is the Orion git repository where
> our code is:
> 
> http://git.eclipse.org/gitroot/orion/org.eclipse.orion.server.git
> 
> See the project org.eclipse.orion.server.git. I'm not sure what you are
> looking for, but perhaps InitJob and CloneJob are starting points to see
> where we invoke the JGit API to init and clone repositories.

You can configure the cache parameters like that:

	final WindowCacheConfig c = new WindowCacheConfig();
	c.setPackedGitLimit(128 * WindowCacheConfig.KB);
	...
	WindowCache.reconfigure(c);
Comment 19 Matthias Sohn CLA 2013-06-19 02:05:08 EDT
(In reply to comment #18)
> (In reply to comment #17)
> > The performance test that was used is attached. It is a simple standalone
> > java program that illustrates the speed difference from CGit. Maybe there is
> > something that can be configured differently there.
> > 
> > The Orion case I can believe needing tuning. It is a single JVM instance
> > serving hundreds of users with thousands of clones, typically running for a
> > month or more between shutdowns. Currently I don't think we do any kind of
> > JGit configuration at all. Are these properties that go into the git config,
> > JVM properties, or something else? This is the Orion git repository where
> > our code is:
> > 
> > http://git.eclipse.org/gitroot/orion/org.eclipse.orion.server.git
> > 
> > See the project org.eclipse.orion.server.git. I'm not sure what you are
> > looking for, but perhaps InitJob and CloneJob are starting points to see
> > where we invoke the JGit API to init and clone repositories.
> 
> You can configure the cache parameters like that:
> 
> 	final WindowCacheConfig c = new WindowCacheConfig();
> 	c.setPackedGitLimit(128 * WindowCacheConfig.KB);
> 	...
> 	WindowCache.reconfigure(c);

As of JGit 3.0 which will be released next week with Kepler reconfiguring the cache has to be done in a slightly different way:

WindowCacheConfig c = new WindowCacheConfig();
... set new configuration params ...
WIndowCacheConfig.install()

see http://wiki.eclipse.org/JGit/New_and_Noteworthy/3.0#WindowCache_reconfiguration