89088 – [performance] Scanner is sending 2 messages per identifier character

Bug 89088 - [performance] Scanner is sending 2 messages per identifier character

Summary: [performance] Scanner is sending 2 messages per identifier character

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.1
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.1 M6
Assignee:	Philipe Mulet
QA Contact:

URL:
Whiteboard:
Keywords:	performance

Depends on:
Blocks:

Reported:	2005-03-25 04:20 EST by Philipe Mulet
Modified:	2005-03-31 10:40 EST (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philipe Mulet

2005-03-25 04:20:06 EST

Build 3.1M5a

Though most identifiers are made of obvious letters and digits, our scanner is
dispatches 2 messages to figure a character is an obvious identifier part:
- #getNextCharAsJavaIdentifierPart()
- #isJavaIdentifierPart()

These could be avoided by treating specially obvious characters.

Comment 1 Philipe Mulet

2005-03-25 04:25:25 EST

Added support for treating obvious characters specially, using an array of 128
characters mapping to char natures (LETTER, DIGIT, SPACE, SEPARATOR).

With this support, there is no more need to go through slow path when compiling
some decent set of sources (JCL 1.4).

Early measurements show a 47% performance improvement in pure tokenizing (not
considering retrieving identifier sources) when repeating 800 times tokenizing
Parser.java (>300k of sources). Before on my machine, we did tokenize 3.5M
tokens/sec, now it is over 5M tokens/sec.

This seems to improve full build scenario by 1-2%.
Need to get some specific performance tests for it.

Comment 2 David Audel

2005-03-31 10:40:03 EST

Verified in I20050330-0500