Bug 372128 - [syntax highlighting] "Highlight enclosing brackets" has massive performance problems
Summary: [syntax highlighting] "Highlight enclosing brackets" has massive performance ...
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Text (show other bugs)
Version: 3.8   Edit
Hardware: PC Windows 7
: P2 normal (vote)
Target Milestone: 3.8 M6   Edit
Assignee: Deepak Azad CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2012-02-21 10:38 EST by Markus Keller CLA
Modified: 2012-03-12 23:22 EDT (History)
3 users (show)

See Also:


Attachments
rough draft (27.00 KB, patch)
2012-03-02 01:54 EST, Deepak Azad CLA
no flags Details | Diff
patch for jdt ui (5.90 KB, patch)
2012-03-06 06:27 EST, Deepak Azad CLA
no flags Details | Diff
patch for platform text (22.93 KB, patch)
2012-03-06 06:32 EST, Deepak Azad CLA
no flags Details | Diff
patch for jdt ui (6.03 KB, patch)
2012-03-07 09:03 EST, Deepak Azad CLA
no flags Details | Diff
patch for platform text (21.08 KB, patch)
2012-03-07 09:20 EST, Deepak Azad CLA
no flags Details | Diff
patch for platform text - only algorithm changes (11.67 KB, patch)
2012-03-08 23:47 EST, Deepak Azad CLA
no flags Details | Diff
patch for platform text (20.17 KB, patch)
2012-03-09 03:40 EST, Deepak Azad CLA
no flags Details | Diff
patch for jdt ui (6.04 KB, patch)
2012-03-11 10:25 EDT, Deepak Azad CLA
no flags Details | Diff
patch for platform text (21.07 KB, patch)
2012-03-11 10:33 EDT, Deepak Azad CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Markus Keller CLA 2012-02-21 10:38:55 EST
N20120217-2000

"Highlight enclosing brackets" has massive performance problems:
- open CleanUpTest
- go to the last test and set caret in front of "public"
- hold down ArrowRight
=> takes multiple seconds to move caret
Comment 1 Deepak Azad CLA 2012-02-21 11:27:09 EST
I was half expecting something like this to come up..

The problem is the probably the for loop in DefaultCharacterPairMatcher.findEnclosingPeerCharacters(...). This works fine as long as the caret is inside a method's braces, but blows up when the caret is outside a method's braces in a huge class like CleanUpTest.

Right now the only thing I can think of is to be a bit smart with document partitions.

A question about the behavior (the answer may affect the solution here)

What should be the enclosing brackets in the following case ? The ones inside the strings or the ones for method invocation?
	void foo1() {
		bar1("(dsf " |, "sdf)"); // | is the caret location
	}
	void bar1(String a, String b){
	}

... and what about in this case when the selection spans partitions.
	bar1("(ds|f " |, "sdf)"); // |f " | is the selection
Comment 2 Deepak Azad CLA 2012-02-21 11:34:55 EST
Actually, I just got another idea which should do the trick.

(In reply to comment #1)
> A question about the behavior
... though it would still be helpful to get an agreement on the behavior in these cases.
Comment 3 Markus Keller CLA 2012-02-21 12:17:02 EST
> A question about the behavior ...

Bracket matching should mainly work on the Java syntax level.

The matching inside string literals is an add-on that's nice to have if the caret is already inside the string literal, but I wouldn't expect that if the caret is outside of a string.

=>      bar1("(dsf " |, "sdf)"); // | is the caret location
            ^                 ^

When the selection spans partitions, then let the Java code partition win:

=>      bar1("(ds|f " |, "sdf)"); // '|f " |' is the selection
            ^                  ^
Comment 4 Dani Megert CLA 2012-02-22 05:57:41 EST
(In reply to comment #3)
> > A question about the behavior ...
> 
> Bracket matching should mainly work on the Java syntax level.
> 
> The matching inside string literals is an add-on that's nice to have if the
> caret is already inside the string literal, but I wouldn't expect that if the
> caret is outside of a string.
> 
> =>      bar1("(dsf " |, "sdf)"); // | is the caret location
>             ^                 ^
> 
> When the selection spans partitions, then let the Java code partition win:
> 
> =>      bar1("(ds|f " |, "sdf)"); // '|f " |' is the selection
>             ^                  ^

+1.
Comment 5 Deepak Azad CLA 2012-02-22 06:14:53 EST
(In reply to comment #3)
> Bracket matching should mainly work on the Java syntax level.
DefaultCharacterPairMatcher is in Platform, so in the general case we should find matches from the same partition type as the starting offset. Though we can probably override the behavior in JavaPairMatcher, if needed.
 
> The matching inside string literals is an add-on that's nice to have if the
> caret is already inside the string literal, but I wouldn't expect that if the
> caret is outside of a string.
> 
> =>      bar1("(dsf " |, "sdf)"); // | is the caret location
>             ^                 ^
Agree.

> When the selection spans partitions, then let the Java code partition win:
> 
> =>      bar1("(ds|f " |, "sdf)"); // '|f " |' is the selection
>             ^                  ^
I will try to fix this in JavaPairMatcher.
Comment 6 Markus Keller CLA 2012-02-22 06:38:08 EST
> DefaultCharacterPairMatcher is in Platform, so in the general case we should
> find matches from the same partition type as the starting offset. Though we can
> probably override the behavior in JavaPairMatcher, if needed.

I'm not up-to-speed with document partitioning APIs, but I guess preferring IDocument.DEFAULT_CONTENT_TYPE would be enough.
Comment 7 Deepak Azad CLA 2012-02-22 09:34:01 EST
(In reply to comment #6)
> I'm not up-to-speed with document partitioning APIs, but I guess preferring
> IDocument.DEFAULT_CONTENT_TYPE would be enough.
Thanks! That does the trick.

Regarding performance...
I made significant improvements, but there is still some sluggishness. There is a barrier to how fast things can work here, essentially while trying to find enclosing brackets we end up inspecting a very large percentage of characters in a document i.e. document.getChar(offset) is called a large number of times. I ran some tests to see how fast a large document like StyledText.java can be read character by character, and it is not instantaneous for sure.

Since typing must be instantaneous, the sluggishness is not acceptable. The right solution is probably to limit the search to the visible part of the viewer. Though that would mean tweaking the findEnclosingPeerCharacters API once more :-)

However, this will have ramifications for bug 358347 where we want to add an annotation for matching brackets. Rendering the annotation in Overview ruler will be difficult, and even rendering the annotation in Vertical ruler could prove a little tricky. (Just documenting the potential problems we will face, we can deal with the annotation later)
Comment 8 Markus Keller CLA 2012-02-22 13:57:02 EST
I think DefaultCharacterPairMatcher#findEnclosingPeerCharacters(IDocument, int) has an algorithmic problem.

If I add {{{{{{{{{{{{{{{{{{{{{{{{{{{{{ and }}}}}}}}}}}}}}}}}}}}}}}}}}}}} at the beginning/end of a huge file, then the old way (one-way matching) was also not blindingly fast, but it was acceptable.

Finding the enclosing brackets is at most twice as expensive as finding just one peer. But for document length n, the implementation in master costs n times the one-way matching, since it performs the whole matching for every character in the document.

I think this is the most efficient implementation:
Have two pointers that walk away from the selection in both directions. Both pointers have counters of open brackets for each pair (all initially 0). For each encountered bracket, the counter is increased if the bracket opens a new pair, and decreased if it closes a pair.
As soon as the first counter becomes -1, we know the closest enclosing bracket and just have to find a peer starting at the other pointer.


BTW: The implementation of CharPairs#contains(char) is crap. This is faster in practice and doesn't produce loads of garbage:

			char[] pairs= fPairs;
			for (int i= 0, n= pairs.length; i < n; i++) {
				if (c == pairs[i])
					return true;
			}
			return false;

But unfortunately, that's not our bottleneck.
Comment 9 Deepak Azad CLA 2012-02-23 00:35:40 EST
Instead of calling "document.getChar(offset)" a gazillion times, we can call "document.get()" once to get the complete document text as String and then just use the String. 
=> this is much much faster!

- Looping over StyledText via multiple document.getChar(offset) calls takes more than 200 ms
- Looping over StyledText via one document.get() call takes about 30 ms.

(In reply to comment #8)
> Finding the enclosing brackets is at most twice as expensive as finding just
> one peer. 
- With master finding matching bracket for the last bracket in StyledText.java (the worst case scenario) takes about 250 ms.
- When I stop using document.getChar(offset), the same action takes 25 ms!

Performance target
- While the caret is inside a method, findEnclosingPeers takes 1 or 2 ms and hence moving the caret around is quite fast.
- I experimented with large documents to find at which point moving the caret begins to feel slow. If findEnclosingPeers takes more than 10-15 ms and I keep the right arrow pressed, I can visually observe a slowdown in the speed of caret.

*Conclusions*
- Do NOT use document.getChar(offset)
- Aim for findEnclosingPeers to take 10-15 ms or lesser. (Finding a single matching bracket is currently slower than this)

(I have taken time measurements by inserting System.currentTimeMillis() calls in code, hence the numbers are good approximations and not 100% accurate)

> But for document length n, the implementation in master costs n times
> the one-way matching, since it performs the whole matching for every character
> in the document.
I know the implementation is bad, but it is not that bad :-) In the worst case, it performs a match for every opening bracket in the document.
Comment 10 Dani Megert CLA 2012-02-23 05:53:41 EST
Two other possible ways to improve the performance:
- Only search the enclosing brackets if the context really changed i.e. while
  moving that caret around here: { this.is.a.long.statement() }, one would only
  need to search (in Java) when the caret goes into ().

- Only update the enclosing brackets on post-selection changed.
Comment 11 Dani Megert CLA 2012-02-23 05:58:57 EST
(In reply to comment #10)
> - Only update the enclosing brackets on post-selection changed.

We'd need to see how this looks/feels.
Comment 12 Deepak Azad CLA 2012-02-23 08:17:09 EST
(In reply to comment #10)
> Two other possible ways to improve the performance:
> - Only search the enclosing brackets if the context really changed i.e. while
>   moving that caret around here: { this.is.a.long.statement() }, one would only
>   need to search (in Java) when the caret goes into ().
I tried this part and the result is promising i.e. this combined with the fastest possible algorithm for finding enclosing peers can likely work.

> - Only update the enclosing brackets on post-selection changed.
Not sure at this point on how to get the painter to respond only to post-selection changed events.
Comment 13 Markus Keller CLA 2012-02-23 08:23:19 EST
> - Only search the enclosing brackets if the context really changed i.e. while
>   moving that caret around here: { this.is.a.long.statement() }, one would only
>   need to search (in Java) when the caret goes into ().

Would need a lot of infrastructure to handle all cases, e.g. if the user types or pastes text.

> - Only update the enclosing brackets on post-selection changed.

Also sounds scary to manage, and would interfere with the direct updating when the caret is next to a bracket.


> - Do NOT use document.getChar(offset)

Would need to make sure that document.get() doesn't copy the whole document and thus produces garbage. We better improve the performance of document.getChar(offset). E.g. avoid the "getLength()" call in AbstractDocument.getChar(int) by catching the IndexOutOfBoundsException from String#charAt(int), declare CopyOnWriteTextStore#fTextStore as "private StringTextStore", etc.

Please also measure how often getChar is called and whether that number makes sense.

Even with that, we still have an algorithmic problem that won't go away.
Comment 14 Deepak Azad CLA 2012-02-23 09:42:11 EST
(In reply to comment #13)
> > - Only search the enclosing brackets if the context really changed i.e. while
> >   moving that caret around here: { this.is.a.long.statement() }, one would only
> >   need to search (in Java) when the caret goes into ().
> 
> Would need a lot of infrastructure to handle all cases, e.g. if the user types
> or pastes text.
The code I have in my workspace seems to work, and I did not have to do much.

> > - Do NOT use document.getChar(offset)
> 
> Would need to make sure that document.get() doesn't copy the whole document and
> thus produces garbage. 
Hmm ok. I also see bug 292664. So you are saying that using document.get() is not an option?

> We better improve the performance of
> document.getChar(offset). E.g. avoid the "getLength()" call in
> AbstractDocument.getChar(int) by catching the IndexOutOfBoundsException from
> String#charAt(int), declare CopyOnWriteTextStore#fTextStore as "private
> StringTextStore", etc.
So far I did not realize that underneath it all there is a String :-) Let me see if something can be done here.

> Please also measure how often getChar is called and whether that number makes
> sense.

StyledText.java
document length - 349102
getChar(...) count for matching the bracket at the end of the file - 412842
=> I guess this is not too bad.
Comment 15 Markus Keller CLA 2012-02-23 10:17:01 EST
> > Would need to make sure that document.get() doesn't copy the whole document > Hmm ok. I also see bug 292664. So you are saying that using document.get() is
> not an option?

For now: Yes. As a last resort if getChar() cannot be made quick enough, it could still be considered.
Comment 16 Deepak Azad CLA 2012-02-24 12:10:50 EST
I am contemplating tweaking the API to allow restricting the search i.e. change the API to

IRegion findEnclosingPeerCharacters(IDocument document, int offset, int length, int lowerBound, int upperBound);

- I have my doubts about being able to squeeze awesome performance from this which is required for MatchingCharacterPainter.(Currently I am running my experiments with StyledText.java, and even if we succeed for this someone may have a document say 10 times larger)

- In any case MatchingCharacterPainter does not need to find the match if the enclosing brackets are offscreen. (Though one can argue about the case when one of the enclosing bracket is on screen and the other is not, should we strive to highlight the one on-screen or not?)

- The API would be flexible and each client (current ones and in future) can take a call about their performance requirements and call the API appropriately. 
(One can always pass lowerBound=0 and upperBound=document.getLength() to not restrict the search)

This does not mean that I have stopped working on improving the performance :-)
Comment 17 Deepak Azad CLA 2012-02-26 08:04:41 EST
(In reply to comment #13)
> > - Do NOT use document.getChar(offset) 

> We better improve the performance of
> document.getChar(offset). E.g. avoid the "getLength()" call in
> AbstractDocument.getChar(int) by catching the IndexOutOfBoundsException from
> String#charAt(int), declare CopyOnWriteTextStore#fTextStore as "private
> StringTextStore", etc.
These changes help a little, but not enough. document.get() route is still much much faster.

> Would need to make sure that document.get() doesn't copy the whole document and
> thus produces garbage. 
This would mean adding new API to IDocument and to ITextStore, no? (Essentially a get method which does not copy the string, and clients are advised to use this only when really needed.)
Comment 18 Deepak Azad CLA 2012-02-28 12:39:46 EST
Some more fun!

It turns out MatchingCharacterPainter.paint(int) is called twice on a single key press or mouse click, and hence the enclosing brackets computation happens twice.

The problem is that PaintManager is added twice as a key/mouse/text listener. See the call hierarchy of PaintManager.addListeners().Essentially the fix for bug 52571 was not perfect)

I think the fix here is to not install the listeners from PaintManager.install(), i.e. remove line 233. Dani, if you agree, please release this one line change.

This would give 50% improvement right away!
Comment 19 Dani Megert CLA 2012-02-29 04:16:47 EST
(In reply to comment #18)
> Some more fun!
> 
> It turns out MatchingCharacterPainter.paint(int) is called twice on a single
> key press or mouse click, and hence the enclosing brackets computation happens
> twice.
> 
> The problem is that PaintManager is added twice as a key/mouse/text listener.
> See the call hierarchy of PaintManager.addListeners().Essentially the fix for
> bug 52571 was not perfect)
> 
> I think the fix here is to not install the listeners from
> PaintManager.install(), i.e. remove line 233. Dani, if you agree, please
> release this one line change.
> 
> This would give 50% improvement right away!

Good catch, but not so good fix: the add/RemovePainter(...) API would be broken by this. Here's a better fix:
http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=8849002c4b50cc7b0aa21f5acf18008940b96360
Comment 20 Deepak Azad CLA 2012-02-29 04:46:30 EST
(In reply to comment #19)
> Good catch, but not so good fix: the add/RemovePainter(...) API would be broken
> by this. Here's a better fix:
> http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=8849002c4b50cc7b0aa21f5acf18008940b96360

I think the addListeners() call needs to be guarded, just like the removeListeners() call is guarded in PaintManager.inputDocumentAboutToBeChanged(IDocument, IDocument)

		if (newInput != null && newInput != fManager.fDocument) {
			fManager.install(newInput);
			paint(IPainter.TEXT_CHANGE);
			if (oldInput != null)
				addListeners();
		}

I have tested this, and it works.
Comment 21 Dani Megert CLA 2012-02-29 05:33:18 EST
(In reply to comment #20)
> (In reply to comment #19)
> > Good catch, but not so good fix: the add/RemovePainter(...) API would be broken
> > by this. Here's a better fix:
> > http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=8849002c4b50cc7b0aa21f5acf18008940b96360
> 
> I think the addListeners() call needs to be guarded, just like the
> removeListeners() call is guarded in
> PaintManager.inputDocumentAboutToBeChanged(IDocument, IDocument)
> 
>         if (newInput != null && newInput != fManager.fDocument) {
>             fManager.install(newInput);
>             paint(IPainter.TEXT_CHANGE);
>             if (oldInput != null)
>                 addListeners();
>         }
> 
> I have tested this, and it works.

The whitespace painter causes several install/uninstall calls, some even when there's no document. I've now updated install(...) to deal with this.
http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=d57ab6c691d53a419ad801ff8f7ba74342c6e88d
Comment 22 Deepak Azad CLA 2012-02-29 05:56:01 EST
(In reply to comment #21)
> The whitespace painter causes several install/uninstall calls, some even when
> there's no document. I've now updated install(...) to deal with this.
> http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=d57ab6c691d53a419ad801ff8f7ba74342c6e88d

Thanks! This works :-)
Comment 23 Deepak Azad CLA 2012-02-29 14:24:12 EST
I just realized that a big chuck of time during document.getChar(offset) call goes in SynchronizableDocument.getChar(int) which includes a synchronized block. Though I suppose we can't do anything about this.
Comment 24 Deepak Azad CLA 2012-03-02 01:54:45 EST
Created attachment 211936 [details]
rough draft

In our weekly call we had agreed on
- Improving the algorithm
- Avoiding re-computation of enclosing brackets where possible
- Using document.getChar() as opposed to document.get()

The patch is along these lines, and should work nicely...

- I think the algorithm is as fast as it can get. It also mostly works, though there are a few minor glitches
a. In JavaPairMatcher we need to deal with < and > operators
b. The algo finds enclosing bracket to the left and enclosing bracket to the right, which are kind of independent computations. Hence the two enclosing brackets can be of different types. Should the API simply return null in this case?

- Now computation of enclosing brackets happens only when required, and only once per user action.
a. Enclosing brackets are computed when the caret moves across a bracket, or when the user types a bracket (this includes cut, paste operations as well)
b. When user changes selection via keyboard (via Shift + Arrow keys), MatchingCharacterPainter.paint(int) is invoked twice - once for KEY_STROKE and once for SELECTION event. However, the computation happens only once.

- There a few other minor tweaks in the patch for performance reasons, which have been discussed in earlier comments.

- The patch has not gone through a polish or even a clean-up cycle :-) There are quite a bit of commented sysout statements in there.

Performance
1. As long as the computation does not happen the performance is obviously good.
2. However, while moving the caret around I can feel the 'speed bumps' in a large document like StyledText while crossing brackets because computation happens at those locations.
3. The speed bumps can be reduced significantly iff instead of calling document.getChar several times we can get hold of the underlying string. As I mentioned in comment 23 document.getChar() involves a synchronized block which just kills performance. Maybe we can get the underlying string iff the underlying text store is CopyOnWriteTextStore, no?
4. Personally I do NOT like the speed bumps, and if we cannot do (3), I would in addition to all the changes in the patch also restrict the search region i.e comment 16.
Comment 25 Deepak Azad CLA 2012-03-02 05:46:04 EST
(In reply to comment #24)
> Maybe we can get the underlying string iff the
> underlying text store is CopyOnWriteTextStore, no?
Sorry, I take this back. I realize now that on first modification attempt of the document the underlying text store becomes GapTextStore. So doing something specific for CopyOnWriteTextStore will not help much.

> 4. Personally I do NOT like the speed bumps, and if we cannot do (3), I would
> in addition to all the changes in the patch also restrict the search region i.e
> comment 16.
I guess we have no option but to restrict the search :/
Comment 26 Markus Keller CLA 2012-03-05 09:55:12 EST
(In reply to comment #24)
> Created attachment 211936 [details] [diff]
> rough draft

Performance looks fine to me, but I see more bugs than the a. and b. you mentioned (so maybe it's not OK again if you have to take out invalid optimizations).

E.g. when I use Arrow_Right to move into the parameter list for the last method in AbstractTextEditor, then the enclosing parentheses are not removed when I move the caret inside the parameter list, even though the preference to show enclosing brackets is disabled.

> b. The algo finds enclosing bracket to the left and enclosing bracket to the
> right, which are kind of independent computations. Hence the two enclosing
> brackets can be of different types. Should the API simply return null in this
> case?

The cases with invalid bracket nesting should be handled in a predictable way. Of course, the "enclosing brackets" algo needs to do the same as the "matching brackets" algo at positions where there is a matching bracket. For other positions, I'd expect that the enclosing brackets algo:
1. only finds brackets that really match (e.g. not a '{' and a ']')
2. finds the same brackets as before if I only move the caret over non-bracket characters. 

Here's a test case with wrongly nested {} and []:

// (  { hello [ foo ( x ) Humpty-Dumpty } bar ]  ) 

Inside "foo" and "Humpty-Dumpty", the points above don't define whether we match [] or {} -- the second point is not even fulfillable! Reasonable solution are to always start with the closest open bracket on the left of the caret, or to start with the closest bracket in either direction (leaning towards the left if there's a tie).
Comment 27 Deepak Azad CLA 2012-03-06 06:27:11 EST
Created attachment 212125 [details]
patch for jdt ui
Comment 28 Deepak Azad CLA 2012-03-06 06:32:25 EST
Created attachment 212126 [details]
patch for platform text

The patch has some more bug fixes, including handling of < and > operators in JavaPairMatcher (the only change that should affect performance)
 
Things to test
- Performance
- Any obvious bugs while editing text (typing, cut, paste etc) and highlight enclosing brackets on.

There are a few more minor bugs and the code needs to be cleaned up a bit, I am working on these things.

> Here's a test case with wrongly nested {} and []:
> 
> // (  { hello [ foo ( x ) Humpty-Dumpty } bar ]  ) 

I let the left side win in all cases.
Comment 29 Deepak Azad CLA 2012-03-07 09:03:45 EST
Created attachment 212208 [details]
patch for jdt ui
Comment 30 Deepak Azad CLA 2012-03-07 09:20:46 EST
Created attachment 212214 [details]
patch for platform text

That should (hopefully) do it :-)

- I have tightened the code (as compared to previous patches) and tried to fix as many glitches as I could discover. Any remaining bugs in functionality could probably also be fixed in M7.

- There are no changes in tests, and they are all green => this is a good sign! One way to interpret this is that Ctrl+Shift+P should work perfectly, but there may be glitches in MatchingCharacterPainter.

- Of course, the performance looks good.

The existing API's are not touched, but there are a few new ones

- I think ICharacterPairMatcherExtension.isMatchedChar(char) should be non-controversial.

- ICharacterPairMatcherExtension.isRecomputationOfEnclosingPairRequired(IDocument, IRegion, IRegion) - One can argue that this should not be an API. Initially I had implemented this method in MatchingCharacterPainter, however there I needed information about the document partitioning to match within, but that is locked inside DefaultCharacterPairMatcher. 

So it was either this or "String getPartitioning();" (which is present in the previous patch attached on this bug). I preferred isRecomputationOfEnclosingPairRequired(..) over getPartitioning() as the latter is really an implementation detail, and the former a little less so :-)

- DefaultCharacterPairMatcher.getCharacterType(char, IDocument, int) - This is just a hook for extenders to deal with characters which have more than one meaning. e.g. < and > in Java. I think this is also non-controversial.
Comment 31 Markus Keller CLA 2012-03-08 14:48:21 EST
The code that listens to document changes looks too dangerous and I don't think it's really necessary. I removed all changes except for the algorithmic changes in DefaultCharacterPairMatcher, and that was already fast enough.

I can't comment on the situation when caret is in the Javadoc of a method in a huge class, since the braces of the class were not highlighted with the patch nor with the simplified version.

Furthermore, the patch doesn't work when the caret moves to a different partition type, e.g. when I move vertically out of a comment or string here:

public class Try {
    public static void main(String[] args) {

        // if (compare(one, two())

        for (int i= 0; i < 5; i++) {
            
            System.out.println("I (want to) match!");
            
        }
    }
}

Please keep only the changes in DefaultCharacterPairMatcher that don't deal with keeping a reference to the document and fix the mentioned problems.


(Obsolete) comments to code that should go away:

AbstractDocument#getChar(int):
- must call getStore() (since that method could be overridden by subclasses)
- if we want to catch the IndexOutOfBoundsException, then ITextStore#get(int) needs to declare it and we need to make sure all implementations throw it (e.g. ProjectionTextStore currently doesn't).

MatchingCharacterPainter:
- the ITextListener and ITextInputListener interfaces and methods don't need to be API
Comment 32 Deepak Azad CLA 2012-03-08 23:47:30 EST
Created attachment 212353 [details]
patch for platform text - only algorithm changes

The patch removes all changes in MatchingCharacterPainter

(In reply to comment #31)
> The code that listens to document changes looks too dangerous and I don't think
> it's really necessary. I removed all changes except for the algorithmic changes
> in DefaultCharacterPairMatcher, and that was already fast enough.
Really ?
There are at 2 test cases when you do not edit anything. Open StyledText.java, the file is large enough. It also has a number of fields on the top, and placing the caret somewhere in them (say somewhere between line 90 and line 120) highlights the brackets at the top of the file and the last one in the file. 

1. Move the caret around using arrow keys - Performance is OKish, and I can probably agree that it is acceptable.

2. Press Shift and now move the caret around (i.e. Change the selection) - I do not think the performance is good in this case, reason being the computation now happens twice - once for selection change, and once for key press. 

I exported the modified plugins and installed them in my host Eclipse, so there is no slowdown that can possibly happen in a target workbench. With the performance I am seeing in the selection case, I do not think I will want to use the feature.

3. The number of re-computations are even greater in case of document edits. To start with there is one for key press and one for text change. After these 2 events are done, AnnotationPainter.updatePainting(AnnotationModelEvent) calls invalidateTextPresentation() resulting in another round of computation. Having said that, in practice it is a bit hard to 'observe' a slow down in this case because one does not type that fast.

> The code that listens to document changes looks too dangerous
Dangerous in what sense? The worst consequence of a flaw in this code would be that wrong characters are highlighted as matching brackets.

> I can't comment on the situation when caret is in the Javadoc of a method in a
> huge class, since the braces of the class were not highlighted with the patch
> nor with the simplified version.

From comment 26
> Of course, the "enclosing brackets" algo needs to do the same as the "matching
> brackets" algo at positions where there is a matching bracket. 
I agree with this statement. Now applying to the case of partitions..
Place the caret just before '%', so far we 'match' the bracket in second string, the one right after '#'. Do we then want to change this behavior?
		String s1 = ("asd(%sd(   )");
		String s2 = "(bug)gy!#)";
Comment 33 Deepak Azad CLA 2012-03-09 01:01:17 EST
(In reply to comment #31)
While testing performance we are also making 2 assumptions which may or may not be true
- A user will have a machine which is as fast as the ones we are using
- The largest document size a user works with is about 350,000 characters (length of StyledText)
Comment 34 Deepak Azad CLA 2012-03-09 03:40:30 EST
Created attachment 212364 [details]
patch for platform text

This patch fixes the following issues.

(In reply to comment #31)
> Furthermore, the patch doesn't work when the caret moves to a different
> partition type, e.g. when I move vertically out of a comment or string here:
Small glitch in isRecomputationOfEnclosingPairRequired(..)

> AbstractDocument#getChar(int):
> - must call getStore() (since that method could be overridden by subclasses)
Agree.

> - if we want to catch the IndexOutOfBoundsException, then ITextStore#get(int)
> needs to declare it and we need to make sure all implementations throw it (e.g.
> ProjectionTextStore currently doesn't).
Each implementation can throw a different RuntimeException e.g. GapTextStore throws ArrayIndexOutOfBoundsException. In any case, avoiding the call to getLength() gives only a small incremental advantage, we can live without it.
 
> MatchingCharacterPainter:
> - the ITextListener and ITextInputListener interfaces and methods don't need to
> be API
Agree.
Comment 35 Markus Keller CLA 2012-03-09 12:50:49 EST
> Shift+<ArrowKey> at beginning of StyledText

Thanks for the example, that's what I needed to get convinced that we really need to optimize the enclosing brackets implementation by listening to document changes. However, I think we should only install the listeners on the viewer if fHighlightEnclosingPeerCharcters is true. (BTW: Typo in "Charcters".)

> > The code that listens to document changes looks too dangerous
> Dangerous in what sense? The worst consequence of a flaw in this code would be
> that wrong characters are highlighted as matching brackets.

I wouldn't ever argue I have an upper bound for possible consequences ;-). A potential danger here are bad listeners that leak a document, editor, etc.


> Place the caret just before '%', so far we 'match' the bracket in second
> string, the one right after '#'. Do we then want to change this behavior?
>         String s1 = ("asd(%sd(   )");
>         String s2 = "(bug)gy!#)";

No, the matching across (aka skipping of) other partition types should stay.

>> when caret is in the Javadoc of a method, the braces of the class are not highlighted

This is still broken. Reason seems to be DocumentPartitionAccessor#getNextPosition(..) which skips the default partition when we started in a Javadoc partition. I guess for the "enclosing brackets" algo (and only for that one), the right solution is not to skip the default partition iff there are no unmatched brackets when we reach the bounds of the original partition on both sides.

Since we have to decide this when we reach the bounds of the original partition, we need extra loops that only walk to the partition bounds first, and then we can decide how to proceed:
- if there are unmatched brackets, then skip other partition types
- otherwise, skip other partition types except for the default partition

In your example, we would also match the () around the first string literal when the caret is in 'asd', but not after the opening '('.


APIs:

- make a grammar & spelling pass (complete and correct sentences e.g. in ICharacterPairMatcherExtension#isRecomputationOfEnclosingPairRequired(..) or in DefaultCharacterPairMatcher#getCharacterType(..)
"E.g. In Java '<' is used" should be: "E.g. in Java, '<' is used" )

- DefaultCharacterPairMatcher#getCharacterType(..) must not operate with magic constants. If you can't reuse constants from SWT, define new ones. The loops in findEnclosingPeers(..) also need to compare to this constants -- the % operator is not necessary


I'll release the change to make CopyOnWriteTextStore#fTextStore private.
Comment 36 Deepak Azad CLA 2012-03-09 13:18:21 EST
(In reply to comment #35)
> Since we have to decide this when we reach the bounds of the original
> partition, we need extra loops that only walk to the partition bounds first,
> and then we can decide how to proceed:
> - if there are unmatched brackets, then skip other partition types
> - otherwise, skip other partition types except for the default partition
> 
> In your example, we would also match the () around the first string literal
> when the caret is in 'asd', but not after the opening '('.
Wouldn't this be a bit unpredictable? I mean sometimes we match brackets from default partition and sometimes we do not. A user is likely to get confused.

An alternative strategy could be that we match brackets within a given partition, but never with across other partition types, i.e. we get rid of this behavior
> No, the matching across (aka skipping of) other partition types should stay.

In such a case it might be more predictable to let the default partition win.

Also, I assume this part is not a must have for M6, since we can tweak the Algo in M7 as well.
Comment 37 Markus Keller CLA 2012-03-09 15:14:11 EST
> An alternative strategy could be that we match brackets within a given
> partition, but never with across other partition types, i.e. we get rid of this
> behavior

That behavior is quite useful if you e.g. have to deal with code that generates code like this:

        String expr= "new Function(" + args[0] + ", " + args[1] + ")";

I don't want to lose that.

For M6, we should at least mark the enclosing type's {} even when the caret is in the Javadoc (and fix the cross-partition matching in M7).
Comment 38 Deepak Azad CLA 2012-03-11 10:25:31 EDT
Created attachment 212429 [details]
patch for jdt ui
Comment 39 Deepak Azad CLA 2012-03-11 10:33:49 EDT
Created attachment 212430 [details]
patch for platform text

(In reply to comment #35)
> - DefaultCharacterPairMatcher#getCharacterType(..) 
This API was crap, the javadoc and the implementation/intent did not match. I have fixed this in the new patch.

I think we can release this patch for M6.

>> when caret is in the Javadoc of a method, the braces of the class are not highlighted
This part is not yet fixed. However, I will try to fix this as well for M6, but I want to first take a shot at bug 358347 since that may involve some API changes.
Comment 40 Markus Keller CLA 2012-03-11 19:01:17 EDT
Released the last patches with a few Javadoc fixes and a fix in DefaultCharacterPairMatcher#isRecomputationOfEnclosingPairRequired(..):

There were quite a few scenarios where a re-computation is necessary but was not performed by the patch. These include:
- selection from the end of a file into a type body; then press ArrowLeft key
- selection that includes unmatched brackets; then set the caret via mouseDown in a way that there's no bracket between old and new caret position

http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=ab31f1a2507f1497a0f78a48fe2faaed037040c4

http://git.eclipse.org/c/jdt/eclipse.jdt.ui.git/commit/?id=f52357ded2beb1b4fe74a49ac94f928dc8a49f97
Comment 41 Markus Keller CLA 2012-03-12 06:52:11 EDT
In I20120311-2000, I see another problem when the caret moves into a non-default partition:

In the body of StyledText, the {} of the class are highlighted when the caret is in the declaration of field TAB, except when the caret is in '\t'. I guess that's the same problem as when the caret is in the Javadoc of a member.

However, when I set the caret after System.getProperty on the next line and then press the ArrowRight key twice, the caret is in the string literal an the method call's () are still highlighted. In contrast, when I set the caret into TAB and then click into "line.separator", the () are not highlighted. I don't understand this difference and why it looks like it works in the first case.
Comment 42 Markus Keller CLA 2012-03-12 11:50:49 EDT
The last problem was a bug in DefaultCharacterPairMatcher#isRecomputationOfEnclosingPairRequired(..):

Deepak found that the *StartContentType variables were initialized with *EndOffset instead of *StartOffset. On the same lines there was also another bug: For the start of the selection we have to prefer the partition on the left, so preferOpenPartitions must be true.

Fixed with http://git.eclipse.org/c/platform/eclipse.platform.text.git/commit/?id=579a1a62d04dbff688759eed4225e32fb056f06e

Opened bug 373978 for the remaining known problem.
Comment 43 Deepak Azad CLA 2012-03-12 12:06:13 EDT
(In reply to comment #42)
> For the start of the selection we have to prefer the partition on the
> left, 
I had realized that bit,

> so preferOpenPartitions must be true.
but didn't realize that the fix was this simple :-)

Thanks!
Comment 44 Deepak Azad CLA 2012-03-12 23:22:15 EDT
Tested the performance with I20120312-1800 on Linux on quite an old and slow machine - the performance looks good!