563017 – o.e.j.i.compiler.parser.Scanner.getNextToken0() throws ArrayIndexOutOfBounds on each file

Bug 563017 - o.e.j.i.compiler.parser.Scanner.getNextToken0() throws ArrayIndexOutOfBounds on each file

Summary: o.e.j.i.compiler.parser.Scanner.getNextToken0() throws ArrayIndexOutOfBounds ...

Status:	RESOLVED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.8.2
Hardware:	All All

Importance:	P3 normal (vote)
Target Milestone:	4.23 M2
Assignee:	Andrey Loskutov
QA Contact:

URL:
Whiteboard:
Keywords:	performance

Duplicates (1):	456265 (view as bug list)
Depends on:
Blocks:

Reported:	2020-05-10 16:44 EDT by Andrey Loskutov
Modified:	2022-02-21 03:37 EST (History)
CC List:	5 users (show)

See Also:	Gerrit Change Gerrit Change Git Commit 578861

Attachments
aioobe sampled.png (83.61 KB, image/png) 2021-09-23 07:28 EDT, Jörg Kubitz	no flags	Details
stacks_before_patch_in_yourkit (273.51 KB, image/png) 2021-12-08 05:15 EST, Andrey Loskutov	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andrey Loskutov

2020-05-10 16:44:04 EDT

While profiling bug 562727 I see that org.eclipse.jdt.internal.compiler.parser.Scanner.getNextToken0() Scanner.java:1494 throws ArrayIndexOutOfBounds on every file during compilation.

So while compilation of project with 2000 files we see 2000 ArrayIndexOutOfBounds exceptions, 100.000 files -> 100.000 exceptions etc.

This is not an error, it seem to be "code optimization". On the one side, it happens only once per file, on the other side, creating Exception, capturing stack trace and immediately GC it creates some unneeded GC overhead. So with more files it is more likely to produce some measurable effect.

I have a patch, but it must be understood if the performance is better or not with that.

Comment 1 Eclipse Genie

2020-05-10 16:45:35 EDT

New Gerrit change created: https://git.eclipse.org/r/162770

Comment 2 Andrey Loskutov

2020-05-10 18:41:44 EDT

Measurements done with this source input:
https://github.com/iloveeclipse/java-project-generator/tree/Bug_563017
that generates 80000 class/interfaces with almost no code,

and with this helper "Build Project" command https://github.com/iloveeclipse/java-things/tree/master/ProjectLocker, that will clean & compile selected project 10 times, skip 3 first builds
and print median build time for the rest.

On Xeon W-2145 CPU (16 cores), 256 GB RAM, SSD, OpenJDK 11.0.7 with 4 GB heap I see that compilation of 80000 files above is ~2.5 seconds faster with the patch (median 12300 ms vs 14718 ms, measured in 7 runs with extra 3 runs warm-up).

Here is the typical output of JDT internal compilation statistics (note, that the overall time here does not include all the overhead of IDE around compiler):

Without patch:

>FULL BUILD STATS for: JavaProjectGenerator
>   compiled 360410 lines in 10906 ms: 33046.9 lines/s
>   parse: 6755 ms (61.9%), resolve: 865 ms (7.9%), analyze: 62 ms (0.5%), generate: 1008 ms (9.2%)

With patch:

>FULL BUILD STATS for: JavaProjectGenerator
>   compiled 360410 lines in 8684 ms: 41502.7 lines/s
>   parse: 4546 ms (52.3%), resolve: 909 ms (10.4%), analyze: 44 ms (0.5%), generate: 1081 ms (12.4%)

We see that the parsing dominates this scenario because resolve & analyze have less to do with less code, and that it relative time decreased by ~9.5% with the patch. 

I assume that the difference will be smaller with slower IO / CPU and more "real" source input, but still measurement seem to confirm the feeling that avoiding extra exceptions can be faster as avoiding array bounds checking.

Comment 3 Olivier Thomann

2020-05-11 09:14:36 EDT

It is either a bound check for each character of a file or one exception thrown for the file. I would like to see stats based on real projects not just small file almost empty.

Comment 4 Andrey Loskutov

2020-05-11 09:36:16 EDT

(In reply to Olivier Thomann from comment #3)
> It is either a bound check for each character of a file or one exception
> thrown for the file. I would like to see stats based on real projects not
> just small file almost empty.

Sure.

I can update generator to create huge Java files with no other "extra work" needed to see the effect of scanner checking array length on every access, not a problem. So the use case to confirm would be 10 files with ~100.000 lines each (as opposite to 100.000 files with 10 lines)?

I assume also we would not see the big time difference with "real projects", it will be just noise behind "real source" compilation issues (type lookup like in bug 562727 or SimpleLookupTable#rehash as in bug 563030).

One point to think: isn't this strange that this is the *only* place/line in the compiler where we create ArrayIndexOutOfBounds exceptions during compilation, so in all other places we are fine to check for array dimensions, except this one? It just feels wrong.

Comment 5 Till Brychcy

2020-05-11 09:58:49 EDT

(In reply to Andrey Loskutov from comment #4)
> (In reply to Olivier Thomann from comment #3)
> > It is either a bound check for each character of a file or one exception
> > thrown for the file. I would like to see stats based on real projects not
> > just small file almost empty.
> 
> Sure.
> 
> I can update generator to create huge Java files with no other "extra work"
> needed to see the effect of scanner checking array length on every access,
> not a problem. So the use case to confirm would be 10 files with ~100.000
> lines each (as opposite to 100.000 files with 10 lines)?
> 
> I assume also we would not see the big time difference with "real projects",
> it will be just noise behind "real source" compilation issues (type lookup
> like in bug 562727 or SimpleLookupTable#rehash as in bug 563030).
> 
> One point to think: isn't this strange that this is the *only* place/line in
> the compiler where we create ArrayIndexOutOfBounds exceptions during
> compilation, so in all other places we are fine to check for array
> dimensions, except this one? It just feels wrong.

At least on Hotspot, and unless you explicitly use -XX:-OmitStackTraceInFastThrow, the stack trace will not get allocated at all after a few repetitions.

But personally, I dislike use of AIOOBs (or NPEs) during normal program execution, as I often have a breakpoint for them.

The JVM does an array length check anyway. If you use a local variable instead of a field for the check and the array access, the JIT should be able to optimize away the extra "manual check".

Comment 6 Vikas Chandra

2020-11-16 11:17:03 EST

*** Bug 456265 has been marked as a duplicate of this bug. ***

Comment 7 Jörg Kubitz

2021-09-23 07:28:08 EDT

Created attachment 287200 [details]
aioobe sampled.png

The java.lang.ArrayIndexOutOfBoundsException does have measurable impact during parsing. And the catch block does have an impact too. It's not directly visible in stacktraces but it is there.

Comment 8 Andrey Loskutov

2021-12-08 05:15:33 EST

Created attachment 287637 [details]
stacks_before_patch_in_yourkit

See attached screenshot for Yourkit with Eclipse started and single file below opened (no other sources in the workspace): 19 exceptions flying around (10 for some templates init, and 9 for the actual source compilation).

(In reply to Eclipse Genie from comment #1)
> New Gerrit change created: https://git.eclipse.org/r/162770

13 from those 19 exceptions are addressed by the patch above.

Remaining 3 (and most likely more) need additional effort, because all Scanner.getNextChar(*) implementations are copy-pasted and show same issue.

Code to reproduce (important are lambdas and "record"):

public record X(String name, String abc, String something) {
	enum E {
		A
	}

	static class C {
		E e = E.A;
	}

	public static void main(String[] args) {
		C c = new C();
		switch (c.e) {
		case A -> {
			System.out.println("Hello A");
		}
		default -> System.out.println("Hello E");
		}
	}
}

Comment 9 Eclipse Genie

2021-12-08 05:18:23 EST

Gerrit change https://git.eclipse.org/r/c/jdt/eclipse.jdt.core/+/162770 was merged to [master].
Commit: http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=09f256552788421942c6820d9c3bce2b80d9a8e2

Comment 10 Andrey Loskutov

2022-02-21 03:37:38 EST

(In reply to Andrey Loskutov from comment #8)
> Created attachment 287637 [details]
> stacks_before_patch_in_yourkit
> 
> See attached screenshot for Yourkit with Eclipse started and single file
> below opened (no other sources in the workspace): 19 exceptions flying
> around (10 for some templates init, and 9 for the actual source compilation).
> 
> (In reply to Eclipse Genie from comment #1)
> > New Gerrit change created: https://git.eclipse.org/r/162770
> 
> 13 from those 19 exceptions are addressed by the patch above.
> 
> Remaining 3 (and most likely more) need additional effort, because all
> Scanner.getNextChar(*) implementations are copy-pasted and show same issue.

=> bug 578861.

Resolving this one.