Bug 277299 - [implementation] Performance issue with jface text WordRule
Summary: [implementation] Performance issue with jface text WordRule
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.5   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: 3.6 M7   Edit
Assignee: Dani Megert CLA
QA Contact:
URL:
Whiteboard:
Keywords: investigate, performance
Depends on:
Blocks:
 
Reported: 2009-05-21 05:10 EDT by Doug CLA
Modified: 2010-04-12 11:01 EDT (History)
3 users (show)

See Also:


Attachments
WordRule performance patch (1.69 KB, text/plain)
2009-05-21 05:10 EDT, Doug CLA
no flags Details
Performance test (2.31 KB, text/plain)
2009-05-21 06:12 EDT, Doug CLA
no flags Details
Test class (2.35 KB, text/plain)
2009-05-21 06:27 EDT, Doug CLA
no flags Details
WordRule performance patch (1.35 KB, patch)
2009-05-21 07:23 EDT, Doug CLA
daniel_megert: iplog+
daniel_megert: review+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Doug CLA 2009-05-21 05:10:55 EDT
Created attachment 136631 [details]
WordRule performance patch

The WordRule class is used by various text editors for partitioning and colouring. It is also used by some incremental builders for parsing the files.

I've found a major performance problem when using the class in case-insensitive mode. The problem arises from this block of code:

if (token == null && fIgnoreCase) {
	Iterator iter= fWords.keySet().iterator();
	while (iter.hasNext()) {
		String key= (String)iter.next();
		if(buffer.equalsIgnoreCase(key)) {
			token= (IToken)fWords.get(key);
			break;
		}
	}
}

So for every word in the rule .equalsIgnoreCase() is called, which will convert the key chars to lowercase before comparing. As you can imagine, this is a very slow piece of code when you have a lot of words to iterate over, and does not scale well compared to a map.

In the attached patch, I convert any added words to lower case as they are initially added, and then convert the buffer to lower case (just once). This means that we can remove the above code altogether, in favour of using a lookup on the hashmap as is done for normal case-sensitive words.

I have also taken the opportunity to replace StringBuffer with the more efficient StringBuilder.

In my testing, this simple change increased the performance of my document parsing by a factor of 30!! That time includes reading the file, converting to a document, applying the rules and then printing out to the console on a sample of 300 source files. I admit that my case may be unusual as I have a couple of hundred case-insensitive words, but I definitely think this is a patch worth applying pre-3.5 release, as it does not affect the API anyway and the performance gain is so enormous.
Comment 1 Dani Megert CLA 2009-05-21 05:24:43 EDT
>I have also taken the opportunity to replace StringBuffer with the more
>efficient StringBuilder.
Unfortunately this is a no go as the bundle must run on 1.4.

Can you please attach a performance test case for this.
Comment 2 Doug CLA 2009-05-21 06:12:59 EDT
Created attachment 136639 [details]
Performance test
Comment 3 Doug CLA 2009-05-21 06:13:51 EDT
Its not a test as such, but you can see how to make it into one. I called the new implementation MyWordRule.

You can vary the parameters, the current settings are:

- 100,000 line document with every line the being same 6-character word that will be found
- 200 words added to the rule
- 10 iterations to average over

With these settings, I get the following result on my PC:

Average of 10 iterations:
	Existing implementation: 3827ms
	New implementation:      307ms

i.e. the new implementation is easily an order of magnitude faster.
Comment 4 Doug CLA 2009-05-21 06:27:40 EDT
Created attachment 136640 [details]
Test class

Corrected a mistake in the word detector. It should find the word, so allow both letters and digits.

It is interesting to vary the wordInDocument paramenter from AtEsT0 to AtEsT+(numWords-1). This makes a huge difference to the existing implementation, but little difference to the new implementation because it uses a map.

"AtEsT0":
	Existing implementation: 917ms
	New implementation:      85ms

"AtEsT199":
	Existing implementation: 1610ms
	New implementation:      105ms
Comment 5 Dani Megert CLA 2009-05-21 07:14:21 EDT
Doug, please make sure to attach a patch that doesn't use StringBuilder and that the org.eclipse.jface.text.tests.rules.WordRuleTest (in 'org.eclipse.jface.text.tests' plug-in) pass.
Comment 6 Doug CLA 2009-05-21 07:23:13 EDT
Created attachment 136645 [details]
WordRule performance patch

Attached version without using StringBuilder.

Sorry, can't run the tests as I have an old environment (3.3) and it will be a real pain to install all the latest Eclipse at the moment. The patch is very small though so you should be able to try it yourself very easily if you have the right env.
Comment 7 Doug CLA 2009-08-27 04:42:18 EDT
Just checking, what is the progress with this? I can think of one very popular language that might get quite a performance boost from this patch!

Thanks.
Comment 8 Dani Megert CLA 2009-08-27 04:46:54 EDT
It's tagged 3.6, so I'll look at it sometime during this release. However, currently I'm busy doing other stuff.
Comment 9 Dani Megert CLA 2010-04-12 11:00:52 EDT
Good patch Doug! I committed it to HEAD with some minor changes:
- removed unused java.util.Iterator import
- removed space before assignment operator
- added you to the copyright notice:
Doug Satchwell <doug.satchwell@ymail.com> - [implementation] Performance issue with jface text WordRule - http://bugs.eclipse.org/277299

and added 'org.eclipse.jdt.text.tests.performance.WordRulePerformanceTest' based on your Test class.