161400 – Scanning of identifiers should be optimized

Bug 161400 - Scanning of identifiers should be optimized

Summary: Scanning of identifiers should be optimized

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.3
Hardware:	PC Windows XP

Importance:	P3 major (vote)
Target Milestone:	3.3 M4
Assignee:	Olivier Thomann
QA Contact:

URL:
Whiteboard:
Keywords:	performance

Duplicates (1):	168761 (view as bug list)
Depends on:	163349
Blocks:	139798
	Show dependency tree

Reported:	2006-10-18 08:51 EDT by Dani Megert
Modified:	2006-12-22 15:27 EST (History)
CC List:	3 users (show)

See Also:

Attachments
Simple test code showing that it takes longer catching an IOOBE (1.62 KB, text/plain) 2006-10-18 08:54 EDT, Dani Megert	no flags	Details
Fix for the reported use case (1.29 KB, patch) 2006-10-18 08:58 EDT, Dani Megert	no flags	Details \| Diff
Profiler output (6.87 KB, application/octet-stream) 2006-10-18 09:03 EDT, Dani Megert	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dani Megert

2006-10-18 08:51:00 EDT

I20061017-0800

I profiled a scenario from bug 139798 and found out that one of the top 5 time wasters is JDT Core by catching IOOBE instead of checking the condition upfront and accessing the array only once.

Here are the steps:
1. start fresh workspace
2. import SWT plug-in
3. open class OS
4. Ctrl+O
5. type 'k'
6. type backspace

Profile step 6 only.

NOTES:
1) not sure whether all VMs perform better when checking the condition instead
   of catching the exception but all the ones I tried do so
2) other places in JDT Core seem to suffer from the same pattern

Comment 1 Dani Megert

2006-10-18 08:54:41 EDT

Created attachment 52214 [details]
Simple test code showing that it takes longer catching an IOOBE

Comment 2 Dani Megert

2006-10-18 08:58:09 EDT

Created attachment 52216 [details]
Fix for the reported use case

The attached patch makes the reported use case about 400ms faster and is in my opinion also easier to understand. It does not other places in the code that have the same pattern. Places - like this one in the scanner - where the code is executed many times are candidates.

Comment 3 Dani Megert

2006-10-18 09:03:30 EDT

Created attachment 52217 [details]
Profiler output

Comment 4 Philipe Mulet

2006-10-18 09:56:57 EDT

Interesting data. We will investigate.
For scanner, the reasoning has always been that you hit the end of file only once per file, vs. a range check for every single char read...

Comment 5 Olivier Thomann

2006-11-01 16:32:16 EST

Do you expect all IOOBE and AIOOBE catching to be replaced by bounds checks or simply the occurances in getNextToken() ?

Comment 6 Olivier Thomann

2006-11-01 23:11:50 EST

It is faster to use an exception for source files, but it is faster to use bound checks when the size is small. In the case of the JavaConventions call, the size is always very small (check for identifiers, ...).
So to improve this, we might want to have different reading methods in the scanner.
I'll prepare a patch for all usage of this exception in the scanner and we can check with the performance tests where we are.

Comment 7 Dani Megert

2006-11-02 02:25:21 EST

>It is faster to use an exception for source files,...
Can you explain this? Doesn't my code example show that bounds testing is always faster than checking the AIOOB exception?

> but it is faster to use
>bound checks when the size is small.
Mmh I tend to disagree (unless we use a different definition of "small": what made me file this bug in the first place was bug 139798 where a really large source (OS.java) with many members showed problems *and* with my patch the performance problem went away.

Re comment 2: my comment "It does not other places in the code that
have the same pattern." was probably not easy to parse ;-)
==> I wanted to say that I did not look for other instances of the same pattern in your code but there might be such.

Re comment 5: when I talked to Philippe we agreed check those places where the impact is big i.e. inside (nested) loops or methods that are called very often.

Comment 8 Olivier Thomann

2006-11-02 09:23:03 EST

(In reply to comment #7)
> Can you explain this? Doesn't my code example show that bounds testing is
> always faster than checking the AIOOB exception?
When we parse a source file, we have one exception thrown. This is faster than checking the bounds for each character read.

> Mmh I tend to disagree (unless we use a different definition of "small": what
> made me file this bug in the first place was bug 139798 where a really large
> source (OS.java) with many members showed problems *and* with my patch the
> performance problem went away.
In your case you are checking java identifier. These are small inputs. So in this case we should always perform a bound check.
Your patch doesn't work as is. It won't scan unicodes properly. It needs to be reviewed.

I am modifying all places where we catch exceptions, but this is not an easy task. We have many places. I'll run our performance tests once we are done.

Comment 9 Dani Megert

2006-11-02 09:57:20 EST

Sorry about the problem with the patch.

Comment 10 Olivier Thomann

2006-11-03 12:28:22 EST

What really needs to be optimized is the scanning of identifiers. This is used in the JavaConventions method and DOM SimpleName methods.
This will need a good fix for bug 163349.
The fixes for this bug and bug 163349 need to be released simultaneously.

Comment 11 Olivier Thomann

2006-11-06 16:43:02 EST

Released for 3.3M4.
Daniel, please verify your scenario with next integration build.

Comment 12 Dani Megert

2006-11-08 11:34:23 EST

Verified in I20061107-0800.

Comment 13 Olivier Thomann

2006-12-22 15:27:21 EST

*** Bug 168761 has been marked as a duplicate of this bug. ***