Bug 571614 - [performance] Avoid File.getCanonicalPath
Summary: [performance] Avoid File.getCanonicalPath
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.20   Edit
Hardware: All Windows 10
: P3 normal (vote)
Target Milestone: 4.21 M3   Edit
Assignee: Jörg Kubitz CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 568157
  Show dependency tree
 
Reported: 2021-03-02 04:59 EST by Jörg Kubitz CLA
Modified: 2021-08-19 06:38 EDT (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jörg Kubitz CLA 2021-03-02 04:59:07 EST
Compile times degrades when i use JDK 15 instead of 11.
As far a i sampled with VisualVM a lot of time is spend in:

JavaProject.canonicalizedPath()
FileIndexLocation.getCanonicalFilePath()
LocalFile.copy() [during XtextBuilder.doClean() and BatchImageBuilder.cleanOutputFolders()]

File.getCanonicalPath is a system call.
On JDK 11 the result has been cached with in JVM.
Since JDK 12 the result is not cached anymore by default:
See https://bugs.openjdk.java.net/browse/JDK-8207005

The caching can be controlled with:
-Dsun.io.useCanonPrefixCache=false -Dsun.io.useCanonCaches=false
-Dsun.io.useCanonPrefixCache=true  -Dsun.io.useCanonCaches=true

JMH microbenchmark for Windows 10:

Benchmark                       Mode  Cnt    Score   Error  Units
File.getCanonicalPath_Cached    avgt   20    0,102 ± 0,003  us/op
File.getCanonicalPath_Uncached  avgt   20  150,659 ± 2,897  us/op

Results do not differ form JDK 11,15 when caching is controlled.

Note, that 150us seems small but for a lot of files it sums up. For small (.java) files it doesnt make much difference if you read the complete file contents or just ask the OS about any of its properties.

I dont even know why getCanonicalPath is used at all:
According to javadoc "canonicalPath" is both *unique* and *absolute*.
while getAbsolutePath is only absolute. Is absolute not enough for eclipse?

Differences may be for symlinks (thats also where the error occured).

I have no proposal yet. Comments welcome.
Comment 1 Jörg Kubitz CLA 2021-03-02 09:51:40 EST
How about replacing with:
 Path.of(fileName).toAbsolutePath().normalize().toString()

Thats absolute and has no ".." within. I guess that all is needed here.
Needs only 1,098 ± 0,089  us/op

For me it returns the same result as getCanonicalPath()
Comment 2 Manoj N Palat CLA 2021-03-02 10:02:03 EST
(In reply to Jörg Kubitz from comment #1)

> For me it returns the same result as getCanonicalPath()

Please check this out:

https://stackoverflow.com/questions/1099300/whats-the-difference-between-getpath-getabsolutepath-and-getcanonicalpath
Comment 3 Jörg Kubitz CLA 2021-03-02 10:12:17 EST
So they say it getCanonicalPath would follow symlinks und linux. which the replacement would not. None of them does follow soft or hardlinks on windows either.
Comment 4 Jörg Kubitz CLA 2021-03-02 19:18:43 EST
Well i found one difference on caseinsensitive OSes like Windows:
getCanonicalPath does convert the character to the cases which are actually stored on the disk.

example:

C:\PROGRAMdata -> C:\ProgramData 
C:\programDATA -> C:\ProgramData 

thats making the names unique (and the call slow ). 
An all-lowercase would also be unique (if uniiqueness is required) but not reflect the name inside the filesystem.
Comment 5 Jörg Kubitz CLA 2021-03-05 09:33:58 EST
on linux (ext4) the problem is far less:

JdkFile.getCanonicalPath_Cached      0.136 ±     0.022  us/op
JdkFile.getCanonicalPath_Uncached    6.776 ±     0.780  us/op
Comment 6 Jörg Kubitz CLA 2021-04-30 05:11:05 EDT
@Manoj please assign to me
Comment 7 Andrey Loskutov CLA 2021-04-30 05:47:57 EDT
(In reply to Jörg Kubitz from comment #3)
> So they say it getCanonicalPath would follow symlinks und linux. which the
> replacement would not. None of them does follow soft or hardlinks on windows
> either.

(In reply to Jörg Kubitz from comment #1)
> How about replacing with:
>  Path.of(fileName).toAbsolutePath().normalize().toString()
> 
> Thats absolute and has no ".." within. I guess that all is needed here.
> Needs only 1,098 ± 0,089  us/op
> 
> For me it returns the same result as getCanonicalPath()

That is not same on Linux, for the reason you're menttiomed above.
So that is not a solution here.
Comment 8 Jörg Kubitz CLA 2021-07-05 04:29:29 EDT
I got rid of some canonicalizedPath in platform but i did not find a nice solution for JavaProject.canonicalizedPath().
The problem with JavaProject.canonicalizedPath() is that its current implementation just looks wrong to me.
To explain that i need to summarize my understanding of canonicalizedPath:

canonicalizedPath() is not well defined: 
For windows: It finds the actual capitalization and 8.3 name resolution for every segment in the normalized path. It does not follow links. Also it resolves special devicenames (which can not be used as filenames) to its special scheme. One also need to know that windows can have hard and softlinks. Also both 8.3 name conversion and capitalization are not a constant feature of windows. They can be turned on/off at runtime for individual drives or folders:
 fsutil.exe file SetCaseSensitiveInfo C:\folder\path enable
 fsutil.exe 8dot3name set 0
For Linux: It does follow links and does normalization. It does not find the actual capitalization. Even if a Windows drive is mounted into the linux filesystem.

A more well defined method is java.nio.file.Path.toRealPath(LinkOption...) where it can be at least configured whether to follow links or not. Unfortunately that is never cached and always as slow as the uncached canonicalizedPath.

So now about JavaProject.canonicalizedPath():
It assumes case sensitivity is a constant. Which is wrong as described above.
On Linux JavaProject.canonicalizedPath does nothing. - Not even normalization.
On Windows it tries to do all the funky name adjustments described above and then wrongly ignores special device names.
And it is not used correctly: in org.eclipse.jdt.core.JavaCore.newLibraryEntry it is only called if the library does not contain a "..". That is wrong as it  disables the normalization feature of canonicalizedPath.

What should JavaProject.canonicalizedPath do? Honestly as its implementation is broken for the edge cases i could not find out. Its not documented either. It tries to find a unique name for a PackageFragment. I can list what it NOT reliably does:
1) normalization
2) following links
3) filtering out special device

The best guess i can give: it was introduced to perform the 8.3 name resolution that may have been still relevant when the method was added in 2001)
Today i guess nobody effectively is using 8.3 names anymore.

So the best i can recommend is to completely delete that canonicalizedPath or let it behave as in linux: do nothing.

What could go wrong? If someone has two PackageFragments where one uses a 8.3 name and the other uses a normal name or other capitalization. AND if both packfragments are meant to refer the same they no longer would. Highly unlikely and even then no Explosion.

I suggest to disable canonicalization in JavaProject.canonicalizedPath() by default and add a secret runtime parameter which could reenable canonicalization there in case of emergency support. After some years the parameter then could be completely removed.

The alternative: never touch a runnning system - even though it is slow, not reliably and even wrong in edge cases.

@Manoj, Andrey: what do you suggest?
Comment 9 Gayan Perera CLA 2021-07-22 15:47:57 EDT
(In reply to Jörg Kubitz from comment #8)
> What could go wrong? If someone has two PackageFragments where one uses a
> 8.3 name and the other uses a normal name or other capitalization. AND if
> both packfragments are meant to refer the same they no longer would. Highly
> unlikely and even then no Explosion.

@Jörg can you provide some samples with file paths for above to understand more ?
Comment 10 Jörg Kubitz CLA 2021-07-22 19:02:41 EDT
(In reply to Gayan Perera from comment #9)
> @Jörg can you provide some samples with file paths for above to understand
> more ?

A file named "C:\workspace\project\libs\liba12345678.jar" (long name) is on windows by default the same as "C:\workspace\project\libs\LIBA12~1.JAR" (8.3 name). A folder "C:\workspace\project\src12345678" is identically to "C:\workspace\project\SRC123~1". This was needed back in times when the file system (for example a data CD) was not supporting long names. So if you have shared a project with long names via CD you could have ended up with short names.
Technically there can be two <classpathentry> for both names (long name and 8.3 name) in the ".classpath". Both would refer to the same single file (or folder).
With my suggested change they would not be identified to be a duplicate anymore.

Note: if today try to add both file/folder name versions to the workspace the UI would already resolve both to the same long name. => You get an error message long before the name is given to jdt. For curiosity it is still possible to add both names via links into the same resource folder.
Comment 11 Gayan Perera CLA 2021-07-23 13:22:29 EDT
@Jörg as suggested in gerrit, could you provide some insight on which method call hierarchy it takes the highest time which degrades the performance ? And try to fix that path first so that we can reduce the scope of this change ?
Comment 12 Jörg Kubitz CLA 2021-08-03 02:30:14 EDT
(In reply to Gayan Perera from comment #11)
> @Jörg as suggested in gerrit, could you provide some insight on which method
> call hierarchy it takes the highest time which degrades the performance ?

For us the most time consuming stacktrace is:

org.eclipse.jdt.internal.core.JavaProject.canonicalizedPath()
org.eclipse.jdt.internal.core.DeltaProcessor$RootInfo.getPackageFragmentRoot()
org.eclipse.jdt.internal.core.DeltaProcessor$RootInfo.<init>()
org.eclipse.jdt.internal.core.DeltaProcessingState.getRootInfos()
org.eclipse.jdt.internal.core.DeltaProcessingState.initializeRoots()
org.eclipse.jdt.internal.core.DeltaProcessor.processResourceDelta()
org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged()
org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged()
org.eclipse.core.internal.events.NotificationManager$1.run()
org.eclipse.core.runtime.SafeRunner.run()
org.eclipse.core.internal.events.NotificationManager.notify()
org.eclipse.core.internal.events.NotificationManager.broadcastChanges()
org.eclipse.core.internal.resources.Workspace.broadcastPostChange()
org.eclipse.core.internal.resources.Workspace.endOperation()
org.eclipse.core.internal.resources.Resource.refreshLocal()
org.eclipse.tea.library.build.tasks.maven.SynchronizeMavenArtifact.runSingle ()

I guess it is because we have a hundreds of projects in the workspace referencing hundreds of external libraries multiple times.

> And try to fix that path first so that we can reduce the scope of this
> change ?

I have no idea how to improve that without caching the result. (Which would be as wrong as using -Dsun.io.useCanonCaches=true)
Comment 13 Jörg Kubitz CLA 2021-08-10 07:11:28 EDT
The problem with DeltaProcessingState.getRootInfos(boolean) is that it contains a loop over all p projects and all classpathes of each project. For a workspace with similar projects which all referencing almost the same external n projects (target platform) this results in O(p*n) accesses.
Any file operation at that point should be avoided (even the cached canonicalize). But RootInfo.cache is initialised eagerly with the cost intensive file operation (under windows only).
We could also avoid the factor p by changing getPackageFragmentRoot() to

boolean noAdjustCase = JavaModelManager.isJrt(this.rootPath)
 // thats case sensitive anyway
|| this.rootPath.getDevice()==null
 // without device the actual case can not be found anyway
|| JavaModelManager.getJavaModelManager().isExternalFile(this.rootPath);
 // correct case was already proven

IPath canonicalizedPath = noAdjustCase 
	? this.rootPath
	:JavaProject.canonicalizedPath(new Path(this.rootPath.toOSString()));

however i still think its best to disable the whole JavaProject.canonicalizedPath(IPath) as it typically returns the argument anyway.
Comment 15 Andrey Loskutov CLA 2021-08-15 18:19:43 EDT
(In reply to Eclipse Genie from comment #14)
> Gerrit change https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/182759 was
> merged to [master].
> Commit:
> http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/
> ?id=3b5583c706f0183fbf9bbd514ba6b63ec82409cd

Jörg, there are few new regressions in JDT UI tests:

https://download.eclipse.org/eclipse/downloads/drops4/I20210815-0600/testresults/html/org.eclipse.jdt.ui.tests_ep421I-unit-win32-java11_win32.win32.x86_64_11.html

Could you please check them?
Comment 16 Jörg Kubitz CLA 2021-08-16 02:56:28 EDT
(In reply to Andrey Loskutov from comment #15)
> Jörg, there are few new regressions in JDT UI tests:
> 
> https://download.eclipse.org/eclipse/downloads/drops4/I20210815-0600/
> testresults/html/org.eclipse.jdt.ui.tests_ep421I-unit-win32-java11_win32.
> win32.x86_64_11.html
> 
> Could you please check them?

Thanks for the hint. I was normaly not able to reproduce it locally. All the failing test are from ContentProviderTests3. During that the new JavaProject.createPackageFragementKey(IPath) is called only for two files:
"C:/Users/JKubitz/eclipseDev/jdt-master2/git/eclipse.jdt.ui/org.eclipse.jdt.ui.tests/testresources/rtstubs15.jar",
"/TestProject1/mylib.jar"
For both externalPath.equals(capitailzePath(externalPath)). So at least on my computer the errors are unrelated to this change.

Can you please retrigger the testrun? I created a new Bug for the probably unrelated error. Bug: 575426
Comment 17 Andrey Loskutov CLA 2021-08-16 03:20:46 EDT
(In reply to Jörg Kubitz from comment #16)
> So at least on my computer the errors are unrelated to this change.
> 
> Can you please retrigger the testrun? I created a new Bug for the probably
> unrelated error. Bug: 575426

I can't (or don't want :-)), but that is the regular nightly SDK build.

Fortunately for *this* change, those tests seems to be unstable, the last nighly build didn't show these fails:
https://download.eclipse.org/eclipse/downloads/drops4/I20210815-1800/testResults.php

With that: can we close this issue, or do you want to provide other patches?
Comment 18 Jörg Kubitz CLA 2021-08-16 03:30:21 EDT
(In reply to Andrey Loskutov from comment #17)
> With that: can we close this issue, or do you want to provide other patches?

I do not plan further commits for this topic.
Comment 19 Jay Arthanareeswaran CLA 2021-08-19 06:38:45 EDT
Verified for 4.21 M3 using build I20210818-1800