Bug 161400 - Scanning of identifiers should be optimized
Summary: Scanning of identifiers should be optimized
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.3   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.3 M4   Edit
Assignee: Olivier Thomann CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
: 168761 (view as bug list)
Depends on: 163349
Blocks: 139798
  Show dependency tree
 
Reported: 2006-10-18 08:51 EDT by Dani Megert CLA
Modified: 2006-12-22 15:27 EST (History)
3 users (show)

See Also:


Attachments
Simple test code showing that it takes longer catching an IOOBE (1.62 KB, text/plain)
2006-10-18 08:54 EDT, Dani Megert CLA
no flags Details
Fix for the reported use case (1.29 KB, patch)
2006-10-18 08:58 EDT, Dani Megert CLA
no flags Details | Diff
Profiler output (6.87 KB, application/octet-stream)
2006-10-18 09:03 EDT, Dani Megert CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2006-10-18 08:51:00 EDT
I20061017-0800

I profiled a scenario from bug 139798 and found out that one of the top 5 time wasters is JDT Core by catching IOOBE instead of checking the condition upfront and accessing the array only once.

Here are the steps:
1. start fresh workspace
2. import SWT plug-in
3. open class OS
4. Ctrl+O
5. type 'k'
6. type backspace

Profile step 6 only.

NOTES:
1) not sure whether all VMs perform better when checking the condition instead
   of catching the exception but all the ones I tried do so
2) other places in JDT Core seem to suffer from the same pattern
Comment 1 Dani Megert CLA 2006-10-18 08:54:41 EDT
Created attachment 52214 [details]
Simple test code showing that it takes longer catching an IOOBE
Comment 2 Dani Megert CLA 2006-10-18 08:58:09 EDT
Created attachment 52216 [details]
Fix for the reported use case

The attached patch makes the reported use case about 400ms faster and is in my opinion also easier to understand. It does not other places in the code that have the same pattern. Places - like this one in the scanner - where the code is executed many times are candidates.
Comment 3 Dani Megert CLA 2006-10-18 09:03:30 EDT
Created attachment 52217 [details]
Profiler output
Comment 4 Philipe Mulet CLA 2006-10-18 09:56:57 EDT
Interesting data. We will investigate.
For scanner, the reasoning has always been that you hit the end of file only once per file, vs. a range check for every single char read... 
Comment 5 Olivier Thomann CLA 2006-11-01 16:32:16 EST
Do you expect all IOOBE and AIOOBE catching to be replaced by bounds checks or simply the occurances in getNextToken() ?
Comment 6 Olivier Thomann CLA 2006-11-01 23:11:50 EST
It is faster to use an exception for source files, but it is faster to use bound checks when the size is small. In the case of the JavaConventions call, the size is always very small (check for identifiers, ...).
So to improve this, we might want to have different reading methods in the scanner.
I'll prepare a patch for all usage of this exception in the scanner and we can check with the performance tests where we are.
Comment 7 Dani Megert CLA 2006-11-02 02:25:21 EST
>It is faster to use an exception for source files,...
Can you explain this? Doesn't my code example show that bounds testing is always faster than checking the AIOOB exception?

> but it is faster to use
>bound checks when the size is small.
Mmh I tend to disagree (unless we use a different definition of "small": what made me file this bug in the first place was bug 139798 where a really large source (OS.java) with many members showed problems *and* with my patch the performance problem went away.

Re comment 2: my comment "It does not other places in the code that
have the same pattern." was probably not easy to parse ;-)
==> I wanted to say that I did not look for other instances of the same pattern in your code but there might be such.

Re comment 5: when I talked to Philippe we agreed check those places where the impact is big i.e. inside (nested) loops or methods that are called very often.
Comment 8 Olivier Thomann CLA 2006-11-02 09:23:03 EST
(In reply to comment #7)
> Can you explain this? Doesn't my code example show that bounds testing is
> always faster than checking the AIOOB exception?
When we parse a source file, we have one exception thrown. This is faster than checking the bounds for each character read.

> Mmh I tend to disagree (unless we use a different definition of "small": what
> made me file this bug in the first place was bug 139798 where a really large
> source (OS.java) with many members showed problems *and* with my patch the
> performance problem went away.
In your case you are checking java identifier. These are small inputs. So in this case we should always perform a bound check.
Your patch doesn't work as is. It won't scan unicodes properly. It needs to be reviewed.

I am modifying all places where we catch exceptions, but this is not an easy task. We have many places. I'll run our performance tests once we are done.
Comment 9 Dani Megert CLA 2006-11-02 09:57:20 EST
Sorry about the problem with the patch.
Comment 10 Olivier Thomann CLA 2006-11-03 12:28:22 EST
What really needs to be optimized is the scanning of identifiers. This is used in the JavaConventions method and DOM SimpleName methods.
This will need a good fix for bug 163349.
The fixes for this bug and bug 163349 need to be released simultaneously.
Comment 11 Olivier Thomann CLA 2006-11-06 16:43:02 EST
Released for 3.3M4.
Daniel, please verify your scenario with next integration build.
Comment 12 Dani Megert CLA 2006-11-08 11:34:23 EST
Verified in I20061107-0800.
Comment 13 Olivier Thomann CLA 2006-12-22 15:27:21 EST
*** Bug 168761 has been marked as a duplicate of this bug. ***