277299 – [implementation] Performance issue with jface text WordRule

Bug 277299 - [implementation] Performance issue with jface text WordRule

Summary: [implementation] Performance issue with jface text WordRule

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Text (show other bugs)
Version:	3.5
Hardware:	All All

Importance:	P3 major (vote)
Target Milestone:	3.6 M7
Assignee:	Dani Megert
QA Contact:

URL:
Whiteboard:
Keywords:	investigate, performance

Depends on:
Blocks:

Reported:	2009-05-21 05:10 EDT by Doug
Modified:	2010-04-12 11:01 EDT (History)
CC List:	3 users (show)

See Also:

Attachments
WordRule performance patch (1.69 KB, text/plain) 2009-05-21 05:10 EDT, Doug	no flags	Details
Performance test (2.31 KB, text/plain) 2009-05-21 06:12 EDT, Doug	no flags	Details
Test class (2.35 KB, text/plain) 2009-05-21 06:27 EDT, Doug	no flags	Details
WordRule performance patch (1.35 KB, patch) 2009-05-21 07:23 EDT, Doug	daniel_megert: iplog+ daniel_megert: review+	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Doug

2009-05-21 05:10:55 EDT

Created attachment 136631 [details]
WordRule performance patch

The WordRule class is used by various text editors for partitioning and colouring. It is also used by some incremental builders for parsing the files.

I've found a major performance problem when using the class in case-insensitive mode. The problem arises from this block of code:

if (token == null && fIgnoreCase) {
	Iterator iter= fWords.keySet().iterator();
	while (iter.hasNext()) {
		String key= (String)iter.next();
		if(buffer.equalsIgnoreCase(key)) {
			token= (IToken)fWords.get(key);
			break;
		}
	}
}

So for every word in the rule .equalsIgnoreCase() is called, which will convert the key chars to lowercase before comparing. As you can imagine, this is a very slow piece of code when you have a lot of words to iterate over, and does not scale well compared to a map.

In the attached patch, I convert any added words to lower case as they are initially added, and then convert the buffer to lower case (just once). This means that we can remove the above code altogether, in favour of using a lookup on the hashmap as is done for normal case-sensitive words.

I have also taken the opportunity to replace StringBuffer with the more efficient StringBuilder.

In my testing, this simple change increased the performance of my document parsing by a factor of 30!! That time includes reading the file, converting to a document, applying the rules and then printing out to the console on a sample of 300 source files. I admit that my case may be unusual as I have a couple of hundred case-insensitive words, but I definitely think this is a patch worth applying pre-3.5 release, as it does not affect the API anyway and the performance gain is so enormous.

Comment 1 Dani Megert

2009-05-21 05:24:43 EDT

>I have also taken the opportunity to replace StringBuffer with the more
>efficient StringBuilder.
Unfortunately this is a no go as the bundle must run on 1.4.

Can you please attach a performance test case for this.

Comment 2 Doug

2009-05-21 06:12:59 EDT

Created attachment 136639 [details]
Performance test

Comment 3 Doug

2009-05-21 06:13:51 EDT

Its not a test as such, but you can see how to make it into one. I called the new implementation MyWordRule.

You can vary the parameters, the current settings are:

- 100,000 line document with every line the being same 6-character word that will be found
- 200 words added to the rule
- 10 iterations to average over

With these settings, I get the following result on my PC:

Average of 10 iterations:
	Existing implementation: 3827ms
	New implementation:      307ms

i.e. the new implementation is easily an order of magnitude faster.

Comment 4 Doug

2009-05-21 06:27:40 EDT

Created attachment 136640 [details]
Test class

Corrected a mistake in the word detector. It should find the word, so allow both letters and digits.

It is interesting to vary the wordInDocument paramenter from AtEsT0 to AtEsT+(numWords-1). This makes a huge difference to the existing implementation, but little difference to the new implementation because it uses a map.

"AtEsT0":
	Existing implementation: 917ms
	New implementation:      85ms

"AtEsT199":
	Existing implementation: 1610ms
	New implementation:      105ms

Comment 5 Dani Megert

2009-05-21 07:14:21 EDT

Doug, please make sure to attach a patch that doesn't use StringBuilder and that the org.eclipse.jface.text.tests.rules.WordRuleTest (in 'org.eclipse.jface.text.tests' plug-in) pass.

Comment 6 Doug

2009-05-21 07:23:13 EDT

Created attachment 136645 [details]
WordRule performance patch

Attached version without using StringBuilder.

Sorry, can't run the tests as I have an old environment (3.3) and it will be a real pain to install all the latest Eclipse at the moment. The patch is very small though so you should be able to try it yourself very easily if you have the right env.

Comment 7 Doug

2009-08-27 04:42:18 EDT

Just checking, what is the progress with this? I can think of one very popular language that might get quite a performance boost from this patch!

Thanks.

Comment 8 Dani Megert

2009-08-27 04:46:54 EDT

It's tagged 3.6, so I'll look at it sometime during this release. However, currently I'm busy doing other stuff.

Comment 9 Dani Megert

2010-04-12 11:00:52 EDT

Good patch Doug! I committed it to HEAD with some minor changes:
- removed unused java.util.Iterator import
- removed space before assignment operator
- added you to the copyright notice:
Doug Satchwell <doug.satchwell@ymail.com> - [implementation] Performance issue with jface text WordRule - http://bugs.eclipse.org/277299

and added 'org.eclipse.jdt.text.tests.performance.WordRulePerformanceTest' based on your Test class.