Bug 560821 - JGit pushes are 4 times slower when bitmap indexes are present
Summary: JGit pushes are 4 times slower when bitmap indexes are present
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.6   Edit
Hardware: Macintosh Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-03-05 10:03 EST by Christian Halstrick CLA
Modified: 2020-03-10 20:08 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Halstrick CLA 2020-03-05 10:03:37 EST
When having two local, bare repositories containing current gerrit then when pushing one new commit from repo1 to repo2 the push operation is a lot slower when the sender (repo1) has bitmap-indexes compared to when he has no bitmaps (in my case 10 times slower).

Enhancing the jgit code in PackWriter to emit where time is spent (see https://git.eclipse.org/r/c/141842/) shows the following:

Problem is the performance of PackWriter#findObjectsToPack. That method delegates to PackWriter#findObjectsToPackUsingBitmaps() which is in my case much slower than using the default code not using bitmaps.
It looks like findObjectsToPackUsingBitmaps() is first calculating all have objects, then all want objects and then calculates the difference. In huge repos calculating all have objects is consuming 900sec. The non-bitmap code in findObjectsToPack() creates a walk where have
and want objects are both used and has to walk only over very few objects which takes only 200sec.

In the end both algorithms (bitmap and non-bitmap aware code) find the same result: only one commit with one new blob has to be sent.

Is somebody aware of the fact that when working on packfiles of 2GB size findObjectsToPackUsingBitmaps() is so much slower than non-bitmap-aware code?

This problem is an older problem already raised in https://www.eclipse.org/lists/jgit-dev/msg03811.html. Since this is still reproduceable with latest jgit I created now this bug.


See here the execution of a script (see https://gist.github.com/chalstrick/864fecf5cc45c056e90225418d6b9c89) which shows the problem:

> jgit --version
jgit version 5.7.0-SNAPSHOT
> rm -fr gerrit.src.git gerrit.dst.git gerrit.client
> git clone --bare --mirror https://gerrit.googlesource.com/gerrit gerrit.dst.git
...
> cp -r gerrit.dst.git gerrit.dst.git.backup
> git clone --bare --mirror gerrit.dst.git gerrit.src.git
...
> git clone gerrit.src.git gerrit.client
...
> ( cd gerrit.client; date >>README.md; git add README.md; git commit -m "modify README.md"; git push origin; )
[master 1c95c125d90] modify README.md
 1 file changed, 1 insertion(+)
...
   e34d91de7fa..1c95c125d90  master -> master
> ( cd gerrit.src.git; time jgit push origin HEAD:refs/heads/master; )
Counting objects:       3
TracePushPerf: findObjectsToPush(): code not using bitmaps runtime: 1428
TracePushPerf: findObjectsToPack() runtime: 1442
Finding sources:        100% (3/3)
Getting sizes:          100% (2/2)
Compressing objects:    100% (4434/4434)
Writing objects:        100% (3/3)
remote: Updating references: 100% (1/1)To /Users/d032780/tmp/z/gerrit.dst.git
   e34d91d..1c95c12  HEAD -> master

real    0m3.536s
user    0m7.391s
sys     0m0.745s
> rm -fr gerrit.dst.git
> cp -r gerrit.dst.git.backup gerrit.dst.git
> ( cd gerrit.src.git; git repack -a -d -b; time jgit push origin HEAD:refs/heads/master; )
Enumerating objects: 1119805, done.
Counting objects: 100% (1119805/1119805), done.
Delta compression using up to 8 threads
Compressing objects: 100% (467812/467812), done.
Writing objects: 100% (1119805/1119805), done.
Selecting bitmap commits: 258045, done.
Building bitmaps: 100% (356/356), done.
Total 1119805 (delta 508417), reused 1119594 (delta 508207)
Counting objects:       518145TracePushPerf: findObjectsToPackUsingBitmaps() ms to find find haves: 6722
TracePushPerf: findObjectsToPackUsingBitmaps()  ms to find find want: 33
TracePushPerf: findObjectsToPackUsingBitmaps()  ms to find find need: 1
TracePushPerf: findObjectsToPackUsingBitmaps()  ms to add needed: 2
TracePushPerf: findObjectsToPackUsingBitmaps() runtime: 6758
Counting objects:       538172
TracePushPerf: findObjectsToPack() runtime: 7111
Finding sources:        100% (3/3)
Getting sizes:          100% (2/2)
Compressing objects:    100% (4434/4434)
Writing objects:        100% (3/3)
remote: Updating references: 100% (1/1)To /Users/d032780/tmp/z/gerrit.dst.git
   e34d91d..1c95c12  HEAD -> master

real    0m8.503s
user    0m11.605s
sys     0m1.015s
Comment 1 Christian Halstrick CLA 2020-03-05 10:56:52 EST
I got a hint to look at https://git.eclipse.org/r/c/157525/. But also when I use that change-series the performance of pushing between two local bare gerrit repos is 4 times slower when bitmaps exist compared to when they not exist.
Comment 2 Ivan Frade CLA 2020-03-10 20:08:02 EDT
Thanks for reporting it here and the steps to reproduce it! We are trying to make better use of bitmaps wherever we can.

I would first give it a try without the --mirror flag. --mirror brings many more refs and maybe the bitmap code is not wise about it (e had that problem in the reachability checks).