Bug 573409 - [performance] Avoid File.getCanonicalPath in LocalFile
Summary: [performance] Avoid File.getCanonicalPath in LocalFile
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 4.21   Edit
Hardware: All Windows All
: P3 enhancement (vote)
Target Milestone: 4.21 M1   Edit
Assignee: Jörg Kubitz CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 574015
  Show dependency tree
 
Reported: 2021-05-06 11:11 EDT by Jörg Kubitz CLA
Modified: 2021-06-11 10:42 EDT (History)
4 users (show)

See Also:


Attachments
jmh micro benchmakr isSame vs getCanonicalPath File573409.java (3.90 KB, application/octet-stream)
2021-06-07 10:59 EDT, Jörg Kubitz CLA
no flags Details
jmh micro benchmakr isSame vs getCanonicalPath two files: File573409_2.java (3.32 KB, application/octet-stream)
2021-06-07 11:07 EDT, Jörg Kubitz CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Kubitz CLA 2021-05-06 11:11:31 EDT
This is a clone of jdt Bug 571614 for platform.resources.

In LocalFile getCanonicalPath() is used to test whether files are physically the same. Same can be done with Files.isSameFile() - which is much faster.
Comment 1 Eclipse Genie CLA 2021-05-06 11:13:29 EDT
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/180307
Comment 2 Eclipse Genie CLA 2021-05-06 11:13:32 EDT
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/180308
Comment 3 Jörg Kubitz CLA 2021-05-06 12:59:43 EDT
Performance comparison on Windows:
java.io.File.getCanonicalPath  ~ 170 us*2 (needed for both files!)
java.nio.file.Files.isSameFile ~  54 us (opens both files in worst case)
java.io.File.exists            ~  18 us (checks just the target) 

Could be probably further improved by totally skipping the isSameFile check if we could detect that the destination does not exist during the subsequent copy operation.
Comment 5 Eclipse Genie CLA 2021-06-07 06:43:01 EDT
New Gerrit change created: https://git.eclipse.org/r/c/platform/eclipse.platform.resources/+/181509
Comment 6 Jörg Kubitz CLA 2021-06-07 10:59:15 EDT
Created attachment 286537 [details]
jmh micro benchmakr isSame vs getCanonicalPath File573409.java

Windows, java 11-16:

Benchmark                             Mode  Cnt    Score    Error  Units
File573409.exists                       ss  100   56,920 ±  6,016  us/op
File573409.getCanonicalPath_Cached      ss  100    7,325 ±  1,128  us/op
File573409.getCanonicalPath_Uncached    ss  100  251,373 ± 18,891  us/op
File573409.isSame                       ss  100    3,818 ±  0,781  us/op
Comment 7 Jörg Kubitz CLA 2021-06-07 11:07:28 EDT
Created attachment 286538 [details]
jmh micro benchmakr isSame vs getCanonicalPath two files: File573409_2.java

windows, two different files compared:

Benchmark                               Mode  Cnt    Score    Error  Units
File573409_2.getCanonicalPath_Cached      ss  100    8,884 ±  2,731  us/op
File573409_2.exists                       ss  100   43,743 ±  3,083  us/op
File573409_2.isSame                       ss  100   88,839 ±  6,565  us/op
File573409_2.getCanonicalPath_Uncached    ss  100  300,320 ± 63,524  us/op

We basically see:
 isSame ~ 2*exists
 getCanonicalPath_Uncached ~ n*exists
Comment 8 Andrey Loskutov CLA 2021-06-07 11:08:30 EDT
(In reply to Jörg Kubitz from comment #6)
> Created attachment 286537 [details]
> jmh micro benchmakr isSame vs getCanonicalPath File573409.java
> 
> Windows, java 11-16:
> 
> Benchmark                             Mode  Cnt    Score    Error  Units
> File573409.exists                       ss  100   56,920 ±  6,016  us/op
> File573409.getCanonicalPath_Cached      ss  100    7,325 ±  1,128  us/op
> File573409.getCanonicalPath_Uncached    ss  100  251,373 ± 18,891  us/op
> File573409.isSame                       ss  100    3,818 ±  0,781  us/op

So we replace 2x cached getCanonicalPath() with exists + isSame? The sum is smaller for the old code.
Comment 9 Jörg Kubitz CLA 2021-06-07 11:30:46 EDT
(In reply to Andrey Loskutov from comment #8)

> So we replace 2x cached getCanonicalPath() with exists + isSame? The sum is
> smaller for the old code.

"isSame" wont be used unless you overwrite a file with another. But typically we delete all files in the target folder first ("clean").

So we replaced 2* "cached" getCanonicalPath with 1*exists. And well the cache was normally empty unless you copy the same file over and over. Only the prefix cache was normally filled as you normally copy many files of the same directory. We had 2*lookup for the files name and replaced it with 1.
Comment 11 Michael Haubenwallner CLA 2021-06-11 04:38:14 EDT
Benchmark on Gentoo Linux, on NVMe SSD using ext4 filesystem:

With JDK 15.0.2, OpenJDK 64-Bit Server VM, 15.0.2+7-27:

Benchmark                               Mode  Cnt   Score   Error  Units
File573409_2.exists                       ss  100   6.502 ± 1.275  us/op
File573409_2.getCanonicalPath_Cached      ss  100   8.357 ± 2.369  us/op
File573409_2.getCanonicalPath_Uncached    ss  100  15.517 ± 2.297  us/op
File573409_2.isSame                       ss  100  18.571 ± 2.600  us/op

With JDK 11.0.9.1, OpenJDK 64-Bit Server VM, 11.0.9.1+1:

Benchmark                               Mode  Cnt   Score   Error  Units
File573409_2.exists                       ss  100   8.029 ± 1.387  us/op
File573409_2.getCanonicalPath_Cached      ss  100   7.401 ± 1.507  us/op
File573409_2.getCanonicalPath_Uncached    ss  100  18.860 ± 2.867  us/op
File573409_2.isSame                       ss  100  19.486 ± 3.207  us/op
Comment 12 Jörg Kubitz CLA 2021-06-11 04:58:59 EDT
(In reply to Michael Haubenwallner from comment #11)
> Benchmark on Gentoo Linux, on NVMe SSD using ext4 filesystem:
> 
> With JDK 15.0.2, OpenJDK 64-Bit Server VM, 15.0.2+7-27:
> 
> Benchmark                               Mode  Cnt   Score   Error  Units
> File573409_2.exists                       ss  100   6.502 ± 1.275  us/op
> File573409_2.getCanonicalPath_Cached      ss  100   8.357 ± 2.369  us/op
> File573409_2.getCanonicalPath_Uncached    ss  100  15.517 ± 2.297  us/op
> File573409_2.isSame                       ss  100  18.571 ± 2.600  us/op
> 
> With JDK 11.0.9.1, OpenJDK 64-Bit Server VM, 11.0.9.1+1:
> 
> Benchmark                               Mode  Cnt   Score   Error  Units
> File573409_2.exists                       ss  100   8.029 ± 1.387  us/op
> File573409_2.getCanonicalPath_Cached      ss  100   7.401 ± 1.507  us/op
> File573409_2.getCanonicalPath_Uncached    ss  100  18.860 ± 2.867  us/op
> File573409_2.isSame                       ss  100  19.486 ± 3.207  us/op

Thanks @Michael.
The numbers show that getCanonicalPath() in the same speed as exists() and isSame is ~2*getCanonicalPath.
So in the normal case where we replaced 2* getCanonicalPath with 1*exist. We also have a speedup on linux. Independent of java version.

We also see performance loss with the uncached variant on linux. Even though it is not as drastic as under windows.