Bug 579837 - JGit is very slow in marking the remote refs as advertised
Summary: JGit is very slow in marking the remote refs as advertised
Status: NEW
Alias: None
Product: JGit
Classification: Technology
Component: JGit (show other bugs)
Version: 5.13   Edit
Hardware: PC Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-05-04 18:59 EDT by Luca Milanesio CLA
Modified: 2022-05-04 19:27 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Luca Milanesio CLA 2022-05-04 18:59:56 EDT
When clone a remote repository that advertised a large number of refs (e.g. in the order of millions) the JGit client spend a lot of time marking the received refs as locally advertised.

See the full stack-trace below:
	at java.lang.Throwable.fillInStackTrace(Native Method)
	at java.lang.Throwable.fillInStackTrace(Throwable.java:784)
	- locked <0x00000007a69a1728> (a java.io.FileNotFoundException)
	at java.lang.Throwable.<init>(Throwable.java:266)
	at java.lang.Exception.<init>(Exception.java:66)
	at java.io.IOException.<init>(IOException.java:58)
	at java.io.FileNotFoundException.<init>(FileNotFoundException.java:77)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at org.eclipse.jgit.internal.storage.file.LooseObjects.getObjectLoader(LooseObjects.java:186)
	at org.eclipse.jgit.internal.storage.file.LooseObjects.open(LooseObjects.java:149)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseObject(ObjectDirectory.java:396)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openLooseFromSelfOrAlternate(ObjectDirectory.java:373)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObjectWithoutRestoring(ObjectDirectory.java:349)
	at org.eclipse.jgit.internal.storage.file.ObjectDirectory.openObject(ObjectDirectory.java:330)
	at org.eclipse.jgit.internal.storage.file.WindowCursor.open(WindowCursor.java:132)
	at org.eclipse.jgit.lib.ObjectReader.open(ObjectReader.java:212)
	at org.eclipse.jgit.revwalk.RevWalk.parseAny(RevWalk.java:1075)
	at org.eclipse.jgit.transport.BasePackFetchConnection.markAdvertised(BasePackFetchConnection.java:987)
	at org.eclipse.jgit.transport.BasePackFetchConnection.markRefsAdvertised(BasePackFetchConnection.java:979)
	at org.eclipse.jgit.transport.BasePackFetchConnection.doFetch(BasePackFetchConnection.java:363)
	at org.eclipse.jgit.transport.TransportHttp$SmartHttpFetchConnection.doFetch(TransportHttp.java:1550)
	at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:302)
	at org.eclipse.jgit.transport.BasePackFetchConnection.fetch(BasePackFetchConnection.java:293)
	at org.eclipse.jgit.transport.FetchProcess.fetchObjects(FetchProcess.java:274)
	at org.eclipse.jgit.transport.FetchProcess.executeImp(FetchProcess.java:171)
	at org.eclipse.jgit.transport.FetchProcess.execute(FetchProcess.java:94)
	at org.eclipse.jgit.transport.Transport.fetch(Transport.java:1309)
	at org.eclipse.jgit.api.FetchCommand.call(FetchCommand.java:213)
	at org.eclipse.jgit.api.CloneCommand.fetch(CloneCommand.java:311)
	at org.eclipse.jgit.api.CloneCommand.call(CloneCommand.java:182)
Comment 1 Luca Milanesio CLA 2022-05-04 19:06:03 EDT
It looks like that even if I have set core.trustfolderstat=true, JGit doesn't even attempt to check if the file exists and tries to open it and catching the associated exception.

Throwing and catching an exception millions of times is going to be *very expensive* for the JVM and it may even takes minutes to complete.

When cloning a remote repository that advertises millions of refs, it is clear that most of them won't be found locally and therefore there is no point in throwing and catching exception millions of times.

I need to try to see *IF* using the File.exists() improve things up, avoiding that path if we cannot trust the folder stats.
Comment 2 Luca Milanesio CLA 2022-05-04 19:07:02 EDT
@Matthias do you see a reason why we could not trust the local folder stats and the File.exists when looking for a loose object on the filesystem?
Comment 3 Luca Milanesio CLA 2022-05-04 19:27:24 EDT
The slowness isn't *unbearable* (overall it takes 5 mins to clone a repository with 2M refs vs. 20s with git) but it compromise the overall timings of an E2E tests run with the gatling-git project.