Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] insertUnpackedObject() perf regression: j.nio.file.Files.exists() 15x slower than j.io.File.exists()

If you have a gist or a JMH harness for performing tests I'm happy to report back timings for Windows and OSX. 

Alex

Sent from my iPhat 6

On 27 Aug 2015, at 18:38, Roberto Tyley <roberto.tyley@xxxxxxxxx> wrote:

While updating the BFG Repo-Cleaner to JGit 4.0, I was surprised to see a consistent performance regression on clean-large-repo benchmarks (/dev/shm on Ubuntu 15.04, OpenJDK 1.8.0_45). Total run time had increased by ~10%. After some fairly chunky automated git-bisecting, I traced the regression to this change:

"Merge bundle org.eclipse.jgit.java7 into org.eclipse.jgit" - https://git.eclipse.org/r/43768
Change-Id: Ib5da61b0886ddbdea65298f1e8c6d65c9879ced1

...and then *specifically* to the exists() method in FS_POSIX, now overriding the default implementation in FS:


The default implementation of FS.exists() uses java.io.File.exists(), while the new implementation in FS_POSIX uses java.nio.file.Files.exists() - by simply removing the override in FS_POSIX, performance was restored. 


Profiling the BFG benchmark, it became clear that in my environment at least, j.nio.file.Files.exists() is substantially slower than j.io.File.exists(), to the point where the exists() call doubles the average cost of a call to ObjectDirectory.insertUnpackedObject() - which the BFG uses a lot, because it's rewriting history. Average times are:

j.io.File.exists() - 4 microseconds
j.nio.file.Files.exists() - 60 microseconds

If you look at the implementation of j.nio.file.Files.exists() it's not hard to believe it's slower, j.io.File.exists() drops almost immediately to a native method, while the NIO method has multiple if statements & layers of indirection (profiling says the real cost comes quite far down, in calls to 'readAttributes' methods).

I had a look at the change that originally introduced use of j.nio.file.Files.exists() (for org.eclipse.jgit.java7 only at that point):

"Extend the FS class for Java7" - https://git.eclipse.org/r/9378
Change-Id: I834b06d0447f84379612b8c9190fa77093617595

The commit message on this change mentions that "there are claims that Files.exists is faster the File.exists" (ie that NIO is faster, contrary to what I've seen). I think that claim might have come from here:


I can't explain the conflicting results for the performance of NIO exists(), though obviously there are many different OSs and Java versions out there, which may behave differently. I've only tested on Ubuntu so far, but can grab a Mac. Getting a Windows box is more difficult for me, let me know if you'd like to volunteer for benchmarking duty.

Aside from performance, I'm not sure if there was any other strong motivation for using j.nio.file.Files.exists(). It's called with LinkOption.NOFOLLOW_LINKS, which means that when checking a symbolic link, it will return true so long as the symbolic link is there, regardless of whether the link points to an existing file. This differs from j.io.File.exists(), which does follow the link, and returns false if the underlying file is not present. Is the NIO behavior useful?

Obviously, my preference would be to remove use of the NIO call. I'm inclined to think it could be removed everywhere, from both FS_POSIX & FS_Win32, but at the very least from ObjectDirectory.insertUnpackedObject().


Roberto

PS I have to confess the difference in total execution time run is just 45s rather than 41s, working on a 1.1GB repository - no BFG users will really care. But, having spent so long trying to create a fast tool, a 10% regression is pretty hard for me to swallow!


<OldFileExistsIsFast.png>
<FilesIsSlow.png>
_______________________________________________
jgit-dev mailing list
jgit-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/jgit-dev

Back to the top