Bug 218605 - [search] SearchPattern: provide way to get the matching regions
Summary: [search] SearchPattern: provide way to get the matching regions
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.4   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.5 M6   Edit
Assignee: Frederic Fusier CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 228397 292198
Blocks:
  Show dependency tree
 
Reported: 2008-02-12 04:18 EST by Martin Aeschlimann CLA
Modified: 2009-10-13 17:17 EDT (History)
5 users (show)

See Also:


Attachments
Proposed patch (231.24 KB, patch)
2009-02-03 06:44 EST, Frederic Fusier CLA
no flags Details | Diff
Patch to test the matching regions from OpenType dialog (2.52 KB, patch)
2009-02-03 06:46 EST, Frederic Fusier CLA
no flags Details | Diff
New proposed patch (95.12 KB, patch)
2009-02-10 11:25 EST, Frederic Fusier CLA
no flags Details | Diff
Hacky patch against JDT UI to show matches in the Open Type dialog (3.69 KB, patch)
2009-02-11 12:05 EST, Dani Megert CLA
no flags Details | Diff
Improved patch (103.67 KB, patch)
2009-02-12 05:05 EST, Frederic Fusier CLA
no flags Details | Diff
Improved patch + API changes (103.90 KB, patch)
2009-02-12 05:12 EST, Frederic Fusier CLA
no flags Details | Diff
Last patch (only javadoc tuning) (104.38 KB, patch)
2009-02-12 05:25 EST, Frederic Fusier CLA
no flags Details | Diff
Final patch (105.20 KB, patch)
2009-02-12 10:45 EST, Frederic Fusier CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2008-02-12 04:18:15 EST
20080212

When using the SearchPattern to find out if a string matches a given pattern we currently only get the information true/false (matches / does not match).

We are investigating to improve the open type dialog and the code assist dialog with colors and font styles to mark the letters that were used to match the pattern.

As an example see the quick access dialog from platform.
- open quick access dialog (Ctrl + 3)
- enter 'Java'
- dialog shows 'Java' in bold in all the matches
- enter 'JTH'
- dialog shows bold letters to mark how the camel case matches

An API solution could be to not only return true/false but an MatchResult that gives access to the matching ranges.
Comment 1 Jerome Lanneluc CLA 2008-02-13 10:46:56 EST
This would be used by the open type dialog and code assist. However this also depends on SWT capability, ie. this api would not be for 3.4 if SWT cannot provide the support.

Dani please let us know when you have more info from SWT.
Comment 2 Frederic Fusier CLA 2008-03-23 12:26:55 EDT
As time was missing to make a robust API for this functionality, we defer to M7 and it will be provided as public internal access instead...
Comment 3 Jerome Lanneluc CLA 2008-05-02 04:47:22 EDT
We ran out of time for 3.4. Deferring to 3.5
Comment 4 Dani Megert CLA 2008-07-03 10:17:03 EDT
Note that for code assist we need another approach than SearchPattern
Comment 5 Dani Megert CLA 2008-07-03 11:14:30 EDT
This only makes sense if bug 228397 gets fixed.
Comment 6 Philipe Mulet CLA 2008-09-10 06:13:01 EDT
On plan for 3.5, targeting M4.
Comment 7 Frederic Fusier CLA 2008-11-21 11:33:38 EST
(In reply to comment #5)
> This only makes sense if bug 228397 gets fixed.
> 
Not sure that bug 228397 will be on 3.5 plan... What would be the consequence for this bug, if it was finally not addressed in this release?
Comment 8 Dani Megert CLA 2008-11-21 12:59:23 EST
We want to show the matching characters in the content assist proposal popup. For this we need both dependent bugs to be fixed.

We should ping Platform SWT and UI what their plan is for 3.5.
Comment 9 Frederic Fusier CLA 2008-12-08 13:16:57 EST
Jerome, as SWT's bug dependency is now resolved, should I target this bug for 3.5M5?
Comment 10 Jerome Lanneluc CLA 2008-12-09 01:52:57 EST
Targeting M5
Comment 11 Frederic Fusier CLA 2009-01-26 11:21:41 EST
Unfortunately, I ran out of time to put this in 3.5M5, hence move the target to M6 milestone....

After having spent days several days to work on this item, here's the initial draft proposal for the API.

1) Open type dialog use case

We agreed with Dani to first focus on this use case and see after how it could be generalized to other use cases (e.g. code assist)....

The API proposal for this use case would be 2 methods on the abstract class org.eclipse.jdt.core.search.TypeNameMatch:
- public int[] getSimpleTypeMatchingRegions()
- public int[] getTypeContainerMatchingRegions()


2) CompletionProposal

This is not currently in the focus, but we already have a thought about it with David and the API would be quite simple, a single method on the class org.eclipse.jdt.core.CompletionProposal:
- public int[] getNameMatchingRegions()
Comment 12 Dani Megert CLA 2009-01-26 11:53:47 EST
In contrast to what we discussed earlier, we also need a generic API to get the matching regions for a given pattern (and search flags) because we narrow down the TypeMatches on our side as well.
Comment 13 Frederic Fusier CLA 2009-01-30 07:15:05 EST
We finally agreed that a unique method would be the simplest solution to address this requirement.

Here's the initial proposal for this API which would be put onto SearchPattern:

/**
 * Answers all the positions in a given name matching a given pattern using
 * a specified match rule.
 * </p><p>
 * Each of these positions is made of its starting index and its length in the
 * given name. They are all concatenated in a single array of <code>int</code>
 * which therefore always has an even length.
 * </p><p>
 * Note that each position is disjointed from the following one.<br>
 * E.g. if the positions are <code>{ start1, length1, start2, length2 }</code>,
 * then <code>start1+length1</code> will always be smaller than
 * <code>start2</code>.
 * <p>
 * The possible comparison rules between the name and the pattern are:
 * <ul>
 * <li>camel case matching: same parts count or not</li>
 * <li>pattern matching: case sensitive or not</li>
 * </ul>
 * <pre>
 * Examples:
 * <ol><li>  pattern = "NPE"
 *  name = NullPointerException / NoPermissionException
 *  matchRule = {@link #R_CAMELCASE_MATCH}
 *  result:  { 0, 1, 4, 1, 11, 1 } / { 0, 1, 2, 1, 12, 1 } </li>
 * <li>  pattern = "NuPoEx"
 *  name = NullPointerException
 *  matchRule = {@link #R_CAMELCASE_MATCH}
 *  result:  { 0, 2, 4, 2, 11, 2 }</li>
 * <li>  pattern = "IPL3"
 *  name = "IPerspectiveListener3"
 *  matchRule = {@link #R_CAMELCASE_MATCH}
 *  result:  { 0, 2, 12, 1, 20, 1 }</li>
 * <li>  pattern = "HashME"
 *  name = "HashMapEntry"
 *  matchRule = {@link #R_CAMELCASE_MATCH}
 *  result:  { 0, 5, 7, 1 }</li>
 * <li>  pattern = "N???Po*Ex?eption"
 *  name = NullPointerException
 *  matchRule = {@link #R_PATTERN_MATCH} | {@link #R_CASE_SENSITIVE}
 *  result:  { 0, 1, 4, 2, 11, 2, 14, 6 }</li>
 * <li>  pattern = "Ha*M*ent*"
 *  name = "HashMapEntry"
 *  matchRule = {@link #R_PATTERN_MATCH}
 *  result:  { 0, 2, 4, 1, 7, 3 }</li>
 * </ol></pre>
 *
 * @see #camelCaseMatch(String, String, boolean) for more details on the
 * 	camel case behavior
 * @see CharOperation#match(char[], char[], boolean) for more details on the
 * 	pattern match behavior
 *
 * @param pattern the given pattern
 * @param name the given name
 * @param matchRule the rule to apply for the comparison.
 * <p>Only following values are accepted:
 * 	<ul>
 * 		<li>{@link #R_CAMELCASE_MATCH}</li>
 * 		<li>{@link #R_CAMELCASE_SAME_PART_COUNT_MATCH}</li>
 * 		<li>{@link #R_PATTERN_MATCH}</li>
 * 		<li>{@link #R_PATTERN_MATCH} | {@link #R_CASE_SENSITIVE}</li>
 * 	</ul>
 * @return an array of <code>int</code> having two slots per returned
 * 	regions (first one is the starting index of the region and the second
 * 	one the length of the region).<br>
 * 	Note that it may be <code>null</code> if the given name does not match
 * 	the pattern
 * @since 3.5
 */
public static final int[] getMatchingRegions(String pattern, String name, int matchRule) {
Comment 14 Frederic Fusier CLA 2009-01-30 07:16:11 EST
(In reply to comment #13)
Please read 'region(s)' instead of 'position(s)' in the proposed API...
Comment 15 Frederic Fusier CLA 2009-02-03 06:44:51 EST
Created attachment 124528 [details]
Proposed patch

One small change in this patch regarding the API proposed at comment 13:
the specified match rule can also be an exact match or a prefix match (using case sensitive or not)...

Note that 618 tests have been added for this new functionality! They have been generated from all JDT tests (Core, UI and Text) which are currently using either camel case or pattern match methods of CharOperation.
Comment 16 Frederic Fusier CLA 2009-02-03 06:46:34 EST
Created attachment 124529 [details]
Patch to test the matching regions from OpenType dialog

Additional patch on JDT/UI v20090125-2000 to test the new API while using the Open Type dialog...
Comment 17 Dani Megert CLA 2009-02-09 10:57:19 EST
I seem to get many 'null's for some non-camel case matches. 

Also, the new API needs to be better spec'ed what happens when 'null' values are passed in, e.g. if the pattern is 'null' I currently get the whole string reported as matched. This needs to be documented.
Comment 18 Frederic Fusier CLA 2009-02-10 11:25:03 EST
Created attachment 125260 [details]
New proposed patch

New proposed patch fixing the obvious problem with prefix match rule. I've modified the test suite to be sure to test this kind of match rule and also reduced the number of redundant tests.

I also have documented the API return when the given pattern is either null or equals to '*'.
Comment 19 Dani Megert CLA 2009-02-11 10:37:07 EST
>I also have documented the API return when the given pattern is either null or
>equals to '*'.
You also need to adjust the parameter Javadoc: currently 'null' is not allowed.
Comment 20 Dani Megert CLA 2009-02-11 11:59:10 EST
When I enter 'HMac' though I get 'null' instead of the correct regions.
Comment 21 Dani Megert CLA 2009-02-11 12:00:08 EST
Note that the previous was just an example. It looks like camel case plus normal matching does not yet work.
Comment 22 Dani Megert CLA 2009-02-11 12:05:27 EST
Created attachment 125413 [details]
Hacky patch against JDT UI to show matches in the Open Type dialog
Comment 23 Frederic Fusier CLA 2009-02-12 05:05:23 EST
Created attachment 125502 [details]
Improved patch

The problem was that 'HMac' failed to match 'HmacCore' as regarding the camel case algorithm (which is case sensitive). Search reports this match because after having failed to match it as a camel case it succeeded to match it as a insensitive prefix...

So, I have added the same behavior for the new API method, hence also added the combination of the R_CASE_SENSITIVE flag for camel case match rule values.

I have also completed the pattern parameter javadoc to allow null.

Here's the new javadoc for the arguments and return:
 *
 * @param pattern the given pattern. If <code>null</code>,
 * 	then the returned region will be the entire given name.
 * @param name the given name
 * @param matchRule the rule to apply for the comparison.<br>
 *     The following values are accepted:
 *     <ul>
 *         <li>{@link #R_EXACT_MATCH}</li>
 *         <li>{@link #R_PREFIX_MATCH}</li>
 *         <li>{@link #R_PATTERN_MATCH}</li>
 *         <li>{@link #R_CAMELCASE_MATCH}</li>
 *         <li>{@link #R_CAMELCASE_SAME_PART_COUNT_MATCH}</li>
 *     </ul>
 *     <p>
 *     Each of these valid values may be also combined with
 *     the {@link #R_CASE_SENSITIVE} flag.
 *     </p>
 *     Some examples:
 *     <ul>
 *         <li>{@link #R_EXACT_MATCH} | {@link #R_CASE_SENSITIVE}:
 *                if an exact case sensitive match is expected,</li>
 *         <li>{@link #R_PREFIX_MATCH} if a case insensitive prefix match is
 *                expected,</li>
 *         <li>{@link #R_CAMELCASE_MATCH} | {@link #R_CASE_SENSITIVE}:
 *                if a case sensitive camel case match is expected, etc.</li>
 *     </ul>
 * @return an array of <code>int</code> having two slots per returned
 *     regions: the first one is the region starting index and the second one
 *     is the region length.
 *     <p>
 *     The returned region may be the entire given name if the given pattern
 *     is either <code>null</code> (whatever the match rule is) or
 *     <code>'*'</code>  with a pattern match rule.
 *     </p><p>
 *     May also be <code>null</code> if the given name does not match
 *     the given pattern.
 *     </p>
 *
Comment 24 Frederic Fusier CLA 2009-02-12 05:12:48 EST
Created attachment 125503 [details]
Improved patch + API changes

Sorry, I didn't include the last API changes in previous patch...
Comment 25 Frederic Fusier CLA 2009-02-12 05:25:24 EST
Created attachment 125504 [details]
Last patch (only javadoc tuning)
Comment 26 Frederic Fusier CLA 2009-02-12 10:45:41 EST
Created attachment 125530 [details]
Final patch

This last patch optimizes string comparisons by first comparing pattern and name lengths. It also optimize insensitive prefix match comparison by using 'substring' and 'equalsIgnoreCase' methods instead of 'toLowerCase' and 'equals' ones...
Comment 27 Frederic Fusier CLA 2009-02-12 10:46:09 EST
(In reply to comment #26)
> Created an attachment (id=125530) [details]
> Final patch
> 
Released for 3.5M6 in HEAD stream.
Comment 28 Jay Arthanareeswaran CLA 2009-03-10 09:25:50 EDT
Verified for 3.5M6 with build I20090309-0100