Bug 397185 - performance issue on DirCache().write()
Summary: performance issue on DirCache().write()
Status: RESOLVED FIXED
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 2.1   Edit
Hardware: All All
: P3 critical with 3 votes (vote)
Target Milestone: 2.3   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2012-12-27 08:03 EST by Quentin Burley CLA
Modified: 2013-02-01 07:22 EST (History)
6 users (show)

See Also:


Attachments
profiler session showing the quatratic number of calls to TreeWalk.isPathPrefix() (13.75 KB, image/png)
2012-12-27 08:03 EST, Quentin Burley CLA
no flags Details
testcase maven project (11.46 KB, application/x-zip-compressed)
2012-12-27 08:13 EST, Quentin Burley CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Quentin Burley CLA 2012-12-27 08:03:24 EST
Created attachment 225063 [details]
profiler session showing the quatratic number of calls to TreeWalk.isPathPrefix()

This issue was introduced by commit c9e507c:
https://github.com/eclipse/jgit/commit/c9e507c0b9fe9a91ae65dfa8a3bcaa1f739a5686

and related to the file: 
org.eclipse.jgit/src/org/eclipse/jgit/dircache/DirCache.java

I have a use case for which this commit c9e507c actually decreases the performance by 500x when invoking the method dirCacheBuilder.getDirCache().write().

In this scenario:
1) create a new working repository
2) insert 10 random blobs of average size 1MB using objectInserter.insert();
3) add 100,000 entries into the dirCache, in a for loop, each time creating a new DirCacheEntry at a random path, setting objectId to and objectId from (2) and then dirCacheBuilder.add(dirCacheEntry0); 
4) dirCacheBuilder.finish();
5) dirCacheBuilder.getDirCache().write();
6) dirCacheBuilder.getDirCache().commit();

the performance issue is noted on step 5) dirCacheBuilder.getDirCache().write();

with this test case on versions:
1.3.0.201202151440-r, 2.0.0.201206130900-r: step 5 executes in approx 150ms

with this test case on versions:
2.1.0.201209190230-r, 2.2.0.201212191850-r, 2.3.0-SNAPSHOT (master 706f8eb 20/12/2012): step 5 executes in approx 66,000 ms

I appreciate that commit c9e507c introduces an improvement for a different, more typical use case, however this same commit has negative performance effects (for example in the test case illustrated 500x performance decrease).

I can send the code of the test case which I have used to interested parties.
--
as can been seen by the following profiler session (attached), the performance issue introduced by this commit seems to be due to the quadratic nature of the call which DirCache.updateSmudgedEntries() makes to TreeWalk.isPathPrefix().

The number of invocations of TreeWalk.isPathPrefix() is approximately 0.5*N*N, (where N is the number of DirEntry's).
Comment 1 Quentin Burley CLA 2012-12-27 08:09:08 EST
affected versions are: 2.1.0.201209190230-r, 2.2.0.201212191850-r, 2.3.0-SNAPSHOT

issue does not affect versions: 1.3.0.201202151440-r, 2.0.0.201206130900-r.
Comment 2 Quentin Burley CLA 2012-12-27 08:13:47 EST
Created attachment 225064 [details]
testcase maven project
Comment 3 Robin Rosenberg CLA 2013-01-04 18:55:45 EST
Perhaps this one solves the problem:

https://git.eclipse.org/r/#/c/9411/
Comment 4 Quentin Burley CLA 2013-01-04 20:05:58 EST
(In reply to comment #3)
> Perhaps this one solves the problem:
> 
> https://git.eclipse.org/r/#/c/9411/

Hi Robin, thanks, I tried out your patch on my test case, and list results compared to other versions.

Test scenario:
"step 5) dirCacheBuilder.getDirCache().write(); with 100,000 entries."

* with this test case on versions:
1.3.0.201202151440-r, 2.0.0.201206130900-r: step 5 executes in approx 150 ms

* with this test case on versions:
2.1.0.201209190230-r, 2.2.0.201212191850-r, 2.3.0-SNAPSHOT (master 706f8eb 20/12/2012): step 5 executes in approx 66,000 ms

* with this test case on version:
2.3.0-SNAPSHOT (patched with https://git.eclipse.org/r/#/c/9411/): step 5 executes in approx 400 ms

 --

Therefore in this case the performance of the patch (https://git.eclipse.org/r/#/c/9411/) is much better than 2.1.0.201209190230-r, 2.2.0.201212191850-r, 2.3.0-SNAPSHOT (150x improvement), but not as good as 1.3.0.201202151440-r, 2.0.0.201206130900-r (2.5x slowdown).

Performance of patch is anyway acceptable on this particular test case (commit on 100,000 entries).
Comment 5 Robin Rosenberg CLA 2013-01-08 18:28:04 EST
Merged
Comment 6 Dani Megert CLA 2013-01-09 04:35:12 EST
The fix was not good. It caused an IAE on every of my repositories [1]. This indicates that the comparator is not correct.


[1]
!MESSAGE An internal error occurred during: "Re-indexing repository eclipse.jdt.debug".
!STACK 0
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.util.TimSort.mergeLo(TimSort.java:747)
	at java.util.TimSort.mergeAt(TimSort.java:483)
	at java.util.TimSort.mergeCollapse(TimSort.java:408)
	at java.util.TimSort.sort(TimSort.java:214)
	at java.util.TimSort.sort(TimSort.java:173)
	at java.util.Arrays.sort(Arrays.java:659)
	at org.eclipse.jgit.treewalk.filter.PathFilterGroup$Group.<init>(PathFilterGroup.java:222)
	at org.eclipse.jgit.treewalk.filter.PathFilterGroup$Group.<init>(PathFilterGroup.java:179)
	at org.eclipse.jgit.treewalk.filter.PathFilterGroup.create(PathFilterGroup.java:143)
	at org.eclipse.jgit.treewalk.filter.PathFilterGroup.createFromStrings(PathFilterGroup.java:91)
	at org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.calcIndexDiffDataIncremental(IndexDiffCacheEntry.java:400)
	at org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry.access$9(IndexDiffCacheEntry.java:381)
	at org.eclipse.egit.core.internal.indexdiff.IndexDiffCacheEntry$5.run(IndexDiffCacheEntry.java:341)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
Comment 7 Dani Megert CLA 2013-01-09 04:35:40 EST
I'm using org.eclipse.egit.core_2.3.0.201301082326.
Comment 8 Robin Rosenberg CLA 2013-01-09 11:05:18 EST
Can you post the list of files, or a reference to a place where I can repeat the error.


Short-term workaround(?) Set the system property java.util.Arrays.useLegacyMergeSort=true
Comment 9 Dani Megert CLA 2013-01-09 11:27:20 EST
(In reply to comment #8)
> Can you post the list of files, or a reference to a place where I can repeat
> the error.

1. start with fresh workspace with Eclipse that has latest EGit
2. clone the jgit repo (ssh://git.eclipse.org/gitroot/jgit/jgit.git)
3. import all projects
==> boom!
Comment 10 Robin Rosenberg CLA 2013-01-09 18:59:01 EST
Reverted the change. Nothing near that code can work.
Comment 11 Dani Megert CLA 2013-01-10 04:12:55 EST
(In reply to comment #10)
> Reverted the change. Nothing near that code can work.

Verified in org.eclipse.jgit_2.3.0.201301091911 that the problem is gone.
Comment 12 Robin Rosenberg CLA 2013-01-12 16:45:22 EST
New attempt at https://git.eclipse.org/r/#/c/9607/
Comment 13 Robin Rosenberg CLA 2013-01-28 19:00:41 EST
Merged https://git.eclipse.org/r/#/c/9607/
Comment 14 Dani Megert CLA 2013-02-01 07:22:06 EST
I did not verify the performance improvement, but verified that it doesn't cause IAEs.