Bug 341197 - Project tree refresh is slow for large classpath+number of projects
Summary: Project tree refresh is slow for large classpath+number of projects
Status: VERIFIED WORKSFORME
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.7   Edit
Hardware: Other Linux
: P3 normal with 1 vote (vote)
Target Milestone: 3.8 M6   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2011-03-29 04:50 EDT by Abhishek Sheopory CLA
Modified: 2012-03-14 04:35 EDT (History)
6 users (show)

See Also:


Attachments
Proposed patch (2.00 KB, patch)
2012-02-01 09:58 EST, Jay Arthanareeswaran CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Abhishek Sheopory CLA 2011-03-29 04:50:20 EDT
Build Identifier: M20100909-0800

A full project tree refresh, which triggers in cases like saving a file at least 2 levels deep inside a folder hierarchy, doesn't scale well in case of very large number of nodes when the classpath is large too.

Long explanation is that function org.eclipse.jdt.internal.core.JavaProject.isOnClasspath(IResource) compares a given string to the project's classpath. This is invoked many times throughout PackageExplorerContentProvider (which is responsible for refreshing the project tree) for each node files/folders in this project tree.

In our example the number of nodes in the project tree (projects + files + folders) are ~14000, and the classpath has ~5500 entries.

This is a classic scale issue for JDT. Perhaps we should investigate if we can:
* Reduce the calls to isOnClasspath(...)
* Use a better data structure for storing classpaths like a hash.

More ideas?

Reproducible: Always

Steps to Reproduce:
1. Create a project with many nodes. Say with 14000+ files/folders added recursively.
2. Set the classpath to have 5500+ entries.
3. Try saving a file at least 2 levels deep from the root of project tree.
Comment 1 Ayushman Jain CLA 2011-03-29 05:06:37 EDT
Jay, please follow up. Thanks!
Comment 2 Szabó Balázs CLA 2011-04-27 05:42:26 EDT
Any news on this?
Comment 3 Jay Arthanareeswaran CLA 2011-04-29 02:36:03 EDT
(In reply to comment #2)
> Any news on this?

Sorry I haven't had a chance a look at it yet with M7 going out. Will look at it soon for M8.
Comment 4 Jay Arthanareeswaran CLA 2011-06-09 05:31:47 EDT
Markus,

Is this something we can fix in PackageExplorerContentProvider? I see a 'TODO' note there about moving the call to isOnClasspath to elsewhere. Will it help in this case?
Comment 5 Markus Keller CLA 2011-06-09 06:56:45 EDT
Looks like these calls are necessary to support inclusion/exclusion patterns. Would have to look at a real test case with a profiler to tell where we can/need to optimize.
Comment 6 Jay Arthanareeswaran CLA 2011-06-13 06:16:06 EDT
Abhishek,
I am assuming you ran some tests that made you think isOnClasspath() is contributing to the slowness. Do you happen to have some numbers from your tests? TIA.
Comment 7 Jay Arthanareeswaran CLA 2011-06-23 05:09:52 EDT
I ran some testing with profiling enabled and I did find some cycles being spent in JavaProject#isOnClasspath(IResource) and JavaProject#contains(IResource). Unfortunately even storing the classpath entries with a hash may not help as these methods spend a big chunk of their time in Path#append() and Path#isPrefixOf() respectively. Lot of String operations is costing us, perhaps. I will see if there is a better way of doing things or avoiding calls to append() and isPrefixOf().

I will be doing some more profiling and update if I find anything more.
Comment 8 Jay Arthanareeswaran CLA 2011-06-24 05:37:24 EDT
Update: On further profiling I find that the synchronized map in ExternalFoldersManager is causing significant slowdown too. While we need it to be synchronized, I will see if we can cut down on the number of calls to ExternalFoldersManager#getFolder(IPath).
Comment 9 Jay Arthanareeswaran CLA 2011-06-27 07:13:56 EDT
I ran the profiler few more times and I think there is something different in the project I am testing. The save (of the java file) doesn't take much time at all. More importantly it doesn't seem to invoke isOnClasspath() multiple times so I am beginning to wonder what the difference in my test project is. The project I am using has around 4000 classpath entries (both 2000 source folders and 2000 jars) and more than 10,000 files and folder (mostly folders). Other use cases such as opening the project, refreshing etc. do take a lot of time.

I made a fix in isOnClasspath() to not use IPath#append(), I do see some improvements in terms of the time taken by isOnClasspath(). If this gets invoked multiple times, then the improvement might be more obvious. But I am afraid I don't have the right project content.

Abhishek,
Shortly, I will attach the patch (or a plugin) that you can try and let me know if you see any improvements. TIA.
Comment 10 Jay Arthanareeswaran CLA 2012-02-01 09:58:04 EST
Created attachment 210374 [details]
Proposed patch

This is the fix I mentioned in my earlier comment. I recreated it for Git from the one that was lying idle for quite some time. Sorry about the delay in getting to it. 

When ran with the existing projects as part of the performance tests, I get this numbers (unfortunately I only have the heap-used numbers) and multiple runs do produce the same difference in performance:

Without fix:
Used Java Heap:       321.99K         (95% in [291.99K, 352K])       Measurable effect: 53.05K (1.3 SDs) (required sample size for an effect of 5% of mean: 109)

With Fix:
Used Java Heap:       241.99K         (95% in [230.48K, 253.5K])     Measurable effect: 20.35K (1.3 SDs) (required sample size for an effect of 5% of mean: 29)

I will try to get a proper performance test in place. But I would appreciate if the reporter can get back to me with an update on performance improvement.
Comment 11 Jay Arthanareeswaran CLA 2012-02-15 06:04:32 EST
(In reply to comment #10)
> Created attachment 210374 [details]
> Proposed patch

After running several more tests, I can't conclusively say whether concerned code can be improved to offer significant performance improvements in all scenarios. I ran tests with and without the patch and for internal and external resources. I am unable to prove that one code is better than the other in all scenarios.

I don't think this is worth any more investment in my opinion. I intend to close this as WORKSFORME.
Comment 12 Jay Arthanareeswaran CLA 2012-02-20 10:40:59 EST
As mentioned in the previous post, closing the bug.
Comment 13 Satyam Kandula CLA 2012-03-14 04:35:58 EDT
It looks like that there is nothing much we can do. If you have any patch which could improve the performance, please feel to reopen the bug. 

Verified for 3.8M6