Bug 545008 - [reconciler] SemanticHighlightingReconciler.reconcilePositions takes very long for very large files
Summary: [reconciler] SemanticHighlightingReconciler.reconcilePositions takes very lon...
Status: RESOLVED INVALID
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.9   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Manoj N Palat CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-03-04 03:57 EST by Simeon Andreev CLA
Modified: 2019-03-07 11:46 EST (History)
4 users (show)

See Also:


Attachments
YourKit snapshot showing the reconciler initialization method take 10 hours to finish. (171.39 KB, text/html)
2019-03-04 03:57 EST, Simeon Andreev CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Simeon Andreev CLA 2019-03-04 03:57:49 EST
Created attachment 277744 [details]
YourKit snapshot showing the reconciler initialization method take 10 hours to finish.

In bug 544921 we report poor performance with a 370k lines of code source file. This is of course an extreme case, however I see that SemanticHighlightingReconciler.reconcilePositions(ASTNode[]) takes 10 hours to finish once I open the source file.

The work is done on a thread, not in a job, so once this huge file is opened and Eclipse is closed at some point, the Eclipse process lives on until the reconcile thread finishes.

The time is solely spent in:

org.eclipse.jdt.core.dom.DefaultBindingResolver.internalGetTypeBinding(TypeBinding, IBinding)


There are 2 problems with this:

1. If we know that a file is too big, we should at least be able to turn off features that will run forever. Editing a 370k lines file in a Linux editor (e.g. KWrite) takes a few seconds for some operations (e.g. opening) but otherwise works.
2. The source file with 370k lines of code has a mostly trivial structure, some inner class definitions and a lot of public final fields. The file size is about 30MB. Its absurd that any operation takes 10 hours on this amount of data. There must be some code with sub-optimal complexity somewhere on the stack traces of the attached YourKit snapshot.
Comment 1 Simeon Andreev CLA 2019-03-04 04:05:22 EST
Do note that in the attached YourKit snapshot I've experimentally replaced the concurrent hash maps of org.eclipse.jdt.core.dom.DefaultBindingResolver.BindingTables with non-concurrent ones.

So the actual running time would actually be even longer than in the snapshot.
Comment 2 Simeon Andreev CLA 2019-03-04 06:18:16 EST
To reproduce to some extent, the following code can be used to generate a class:

public static void main(String[] args) {
	System.out.println("package a;");
	System.out.println("");
	System.out.println("public class A {");
	System.out.println("");
	System.out.println("	public static final class B extends Unknown {");
	System.out.println("		public int x = 2;");
	System.out.println("	}");
	System.out.println("");

	int n = 1_000;
	int m = 100;
	for (int i = 0; i < n; ++i) {
		System.out.println("	public static final class C" + i + " {");				
		for (int j = 0; j < m; ++j) {
			System.out.println("		public final B b" + ((i * m) + j) + " = new B();");			
		}
		System.out.println("	}");
	}
	System.out.println("}");
}

The "key" seems to be the unknown type for the super class of "B". Without this,  the reconciler finishes in 4-5 seconds. With this unknown super type the reconciler runs for about:

* 544 seconds at 150k lines of code (~5.1M B)
* 304 seconds at 125k lines of code (~4.2 MB)
* 105 seconds at 100k lines of code (~3.4 MB)
* 4 seconds at 75k lines of code
* 1 second at 50k lines of code
* 200ms at 10k lines of code
* less than a second at 10k lines of code.
Comment 3 Simeon Andreev CLA 2019-03-04 08:20:22 EST
I do notice that if I occasionally restart the IDE, the time spent in the initialization method drops to a few seconds, even with 150k lines of code.

If I on the other hand keep replacing the contents of the source file with a larger class before closing and opening the source file, I observe the really long running time.

Even with the source file with which we initially observed the problem in our product (370k lines of code), if I open the file with a freshly restarted IDE the initialization code finishes in 30 seconds.

Maybe the backing hash map becomes cluttered at some point and runs into lots of collisions.
Comment 4 Simeon Andreev CLA 2019-03-04 10:02:00 EST
OK, the extremely long running times seem to be the product of the following sequence:

1. Open a 100k lines source file (e.g. generate one with code from comment 3).
2. Select all, copy, paste the same 100k lines.
3. Close the file and re-open it.

In this case there are a lot of removed positions due to replacing file contents with other contents. There is a linear search at the following location:

org.eclipse.jdt.internal.ui.javaeditor.SemanticHighlightingReconciler.PositionCollector.addPosition(int, int, Highlighting)

For each new highlight, the code will look for a previously removed position, by iterating over all removed positions. Since all positions were removed and are being re-added, this has quadratic complexity.

I'm not certain how valid of a use case this is, it seems like a product of my profiling actions more than anything.
Comment 5 Jay Arthanareeswaran CLA 2019-03-04 10:49:41 EST
(In reply to Simeon Andreev from comment #4) 
> In this case there are a lot of removed positions due to replacing file
> contents with other contents. There is a linear search at the following
> location:
> org.eclipse.jdt.internal.ui.javaeditor.SemanticHighlightingReconciler.PositionCollector.addPosition(int,
> int, Highlighting)

May be Noopur can throw some light on this.
Comment 6 Dani Megert CLA 2019-03-04 10:57:10 EST
(In reply to Simeon Andreev from comment #4)
> I'm not certain how valid of a use case this is, it seems like a product of
> my profiling actions more than anything.
Not sure how to interpret this. Are you saying it is not a real issue but caused by your test steps?
Comment 7 Simeon Andreev CLA 2019-03-05 02:24:03 EST
(In reply to Dani Megert from comment #6)
> (In reply to Simeon Andreev from comment #4)
> > I'm not certain how valid of a use case this is, it seems like a product of
> > my profiling actions more than anything.
> Not sure how to interpret this. Are you saying it is not a real issue but
> caused by your test steps?

I've not been able to reproduce very long running times, other than with the following sequence:

1. Open a very large source file (e.g. 100k LOC) in a Java editor.
2. Select all, paste equally large source file contents (again e.g. 100k LOC).
3. Close the editor of the source file.
4. Open the source file again in a Java editor.

I can't imagine how that would be a use case for Java development. It seems to be rather unlikely that an Eclipse user would run into the problem with a standard workflow.
Comment 8 Dani Megert CLA 2019-03-05 05:26:09 EST
(In reply to Simeon Andreev from comment #7)
> (In reply to Dani Megert from comment #6)
> > (In reply to Simeon Andreev from comment #4)
> > > I'm not certain how valid of a use case this is, it seems like a product of
> > > my profiling actions more than anything.
> > Not sure how to interpret this. Are you saying it is not a real issue but
> > caused by your test steps?
> 
> I've not been able to reproduce very long running times, other than with the
> following sequence:
> 
> 1. Open a very large source file (e.g. 100k LOC) in a Java editor.
> 2. Select all, paste equally large source file contents (again e.g. 100k
> LOC).
> 3. Close the editor of the source file.
> 4. Open the source file again in a Java editor.
> 
> I can't imagine how that would be a use case for Java development. It seems
> to be rather unlikely that an Eclipse user would run into the problem with a
> standard workflow.
I agree. So, are you OK to close it?
Comment 9 Simeon Andreev CLA 2019-03-05 05:28:02 EST
(In reply to Dani Megert from comment #8)
> I agree. So, are you OK to close it?

Andrey?
Comment 10 Andrey Loskutov CLA 2019-03-07 04:56:49 EST
(In reply to Simeon Andreev from comment #7)
> (In reply to Dani Megert from comment #6)
> > (In reply to Simeon Andreev from comment #4)
> > > I'm not certain how valid of a use case this is, it seems like a product of
> > > my profiling actions more than anything.
> > Not sure how to interpret this. Are you saying it is not a real issue but
> > caused by your test steps?
> 
> I've not been able to reproduce very long running times, other than with the
> following sequence:
> 
> 1. Open a very large source file (e.g. 100k LOC) in a Java editor.
> 2. Select all, paste equally large source file contents (again e.g. 100k
> LOC).
> 3. Close the editor of the source file.
> 4. Open the source file again in a Java editor.
> 
> I can't imagine how that would be a use case for Java development. It seems
> to be rather unlikely that an Eclipse user would run into the problem with a
> standard workflow.

Yep. If this is the only way to force the performance issue, we can close the bug.