124624 – [search] Camelcase matching routines should support end character

Bug 124624 - [search] Camelcase matching routines should support end character

Summary: [search] Camelcase matching routines should support end character

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.2
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	3.4 M2
Assignee:	Frederic Fusier
QA Contact:

URL:
Whiteboard:
Keywords:	api

Duplicates (1):	202220 (view as bug list)
Depends on:
Blocks:	190437 200400
	Show dependency tree

Reported:	2006-01-20 05:54 EST by Dirk Baeumer
Modified:	2007-09-18 03:59 EDT (History)
CC List:	5 users (show)

See Also:

Attachments
Proposed patch (70.31 KB, patch) 2007-08-17 07:30 EDT, Frederic Fusier	no flags	Details \| Diff
Edited javadoc comment on CharOperation (18.39 KB, text/plain) 2007-08-17 10:32 EDT, Jerome Lanneluc	no flags	Details
New proposed patch (110.48 KB, patch) 2007-08-17 13:03 EDT, Frederic Fusier	no flags	Details \| Diff
Final proposed patch (216.21 KB, patch) 2007-08-23 05:38 EDT, Frederic Fusier	no flags	Details \| Diff
Patch to be released (231.22 KB, patch) 2007-08-24 01:55 EDT, Frederic Fusier	no flags	Details \| Diff
Show Obsolete (4) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dirk Baeumer

2006-01-20 05:54:52 EST

The open type dialog support an end character ('<'). So if you type *ction< only classes ending with ction are shown. We should have the same semantic for camel case matches. For example SB< only matches StringBufffer but not SafariBrowserFactory. The end character should also work in the following case: SBuffer<

Comment 1 Philipe Mulet

2006-01-20 06:01:26 EST

This character would mean only find generic types to me.
Looks like you would need to pick a different delimiter.

Comment 2 Dirk Baeumer

2006-01-20 09:15:30 EST

This is the character we use since 1.0 ;-) Any suggestions ?

Comment 3 Philipe Mulet

2006-01-20 10:10:07 EST

I hadn't noticed this feature ever... <g>

Comment 4 Frederic Fusier

2006-04-26 11:51:51 EDT

Not critical => defer post 3.2

Comment 5 Frederic Fusier

2006-04-26 11:52:50 EDT

Comment 6 Frederic Fusier

2007-06-21 10:43:17 EDT

Reopen as LATER is deprecated...

Comment 7 Markus Keller

2007-08-14 05:34:31 EDT

We're already supporting this in the UI in 3.3 (bug 174349), but it's unclear how the feature is expected to work with examples like "Hash<" (same as "Hash "), which can either be interpreted as
- an exact pattern matching only "Hash", or as
- a camelCase pattern matching e.g. "Hashtable", but not "HashMap" (2 hunks).

The UI currently assumes the latter, which already provoked bug 190437. When I fixed bug 176017 in HEAD, I had to leave a dependency to ui.dialog.SearchPattern to support this case. We should decide about the "right" solution for camelCase & exact match mode and push it into jdt.core.search.SearchPattern.

Comment 8 Frederic Fusier

2007-08-14 06:25:27 EDT

I think that ending character ever means that searched string should end with last pattern characters. So, IMO, "Hash<" means exact match as you only have one uppercase character. Camel case cannot be applied in this peculiar case...
However, if you specify "HMap<", then 'HashMap' would be a valid match to return although 'HtmlMapper' would not...

So, my assumption is that it would be always Camel Case, even with end character but in some case, only exact match will be returned. This end character would then just prevent the Camel Case algorithm to accept prefix match when the last character of the pattern is consumed...

Note that, as I said in bug 109695 comment 18, it's definitely not a good idea to include end character in the camel case algorithm because API users may want to use their own specific characters.

The best thing to do is to add a boolean parameter on camelCaseMatch method which will indicates that when there are lowercase characters at the end of the pattern, searched strings end must match exactly these characters or may accept additional ones after...

Comment 9 Markus Keller

2007-08-14 07:13:04 EDT

(In reply to comment #8)

> I think that ending character ever means that searched string should end with
> last pattern characters. [..]

There are more cases that need to be specified, e.g. "HM<", "HaM<", "HashM<".

Proposed camel case with end character behavior:
- 1 camel case hunk ("Hash<")
   -> no camel case (only exact match)
- 2+ camel case hunks
   -> hunk count must be equal in pattern and match
   -> hunks with lowercase must match completely (prefix is not enough)
   -> hunks of just 1 uppercase character still performs prefix match

(+ == matches, - == no match)   String
Pattern         "HashMap"    "HtmlMapper"    "HashMapEntry"
"HM<"           +            +               -
"HaM<"          -            -               -
"HashM<"        +            -               -

"HMap<"         +            -               -
"HaMap<"        -            -               -
"HashMap<"      +            -               -


> The best thing to do is to add a boolean parameter on camelCaseMatch method [..]

I agree, but we also need this support in the match rules. Issue: The Javadoc of SearchPattern#R_CAMELCASE_MATCH already talks about combining this with SearchPattern#R_PREFIX_MATCH, but on the contrary, SearchPattern#validateMatchRule(..) says that camel case implies prefix match. 

The new mode could be R_CAMELCASE_MATCH | R_FULL_MATCH (since we probably shouldn't make R_CAMELCASE_MATCH stricter than it is today, and R_CAMELCASE_MATCH | R_EXACT_MATCH is not different from R_CAMELCASE_MATCH).

Comment 10 Frederic Fusier

2007-08-14 08:43:42 EDT

(In reply to comment #9)
> 
> There are more cases that need to be specified, e.g. "HM<", "HaM<", "HashM<".
> 
> Proposed camel case with end character behavior:
> - 1 camel case hunk ("Hash<")
>    -> no camel case (only exact match)
> - 2+ camel case hunks
>    -> hunk count must be equal in pattern and match
>    -> hunks with lowercase must match completely (prefix is not enough)
>    -> hunks of just 1 uppercase character still performs prefix match
> 
> (+ == matches, - == no match)   String
> Pattern         "HashMap"    "HtmlMapper"    "HashMapEntry"
> "HM<"           +            +               -
> "HaM<"          -            -               -
> "HashM<"        +            -               -
> 
> "HMap<"         +            -               -
> "HaMap<"        -            -               -
> "HashMap<"      +            -               -
> 
I'd slightly modify this spec as follow: 
 - 2+ camel case hunks
    -> hunk count must be equal in pattern and match
    -> last hunk with lowercase must match completely (prefix is not enough)
    -> last hunk of just 1 uppercase character still performs prefix match

which would change the results as follow:
Pattern         "HashMap"    "HtmlMapper"    "HashMapEntry"
"HM<"           +            +               -
"HaM<"          +            -               -
"HashM<"        +            -               -

"HMa<"          -            -               -
"HaMa<"         -            -               -
"HashMa<"       -            -               -

"HMap<"         +            -               -
"HaMap<"        +            -               -
"HashMap<"      +            -               -

> I agree, but we also need this support in the match rules. Issue: The Javadoc
> of SearchPattern#R_CAMELCASE_MATCH already talks about combining this with
> SearchPattern#R_PREFIX_MATCH, but on the contrary,
> SearchPattern#validateMatchRule(..) says that camel case implies prefix match. 
> 
> The new mode could be R_CAMELCASE_MATCH | R_FULL_MATCH (since we probably
> shouldn't make R_CAMELCASE_MATCH stricter than it is today, and
> R_CAMELCASE_MATCH | R_EXACT_MATCH is not different from R_CAMELCASE_MATCH).
> 
You're right. The additional R_FULL_MATCH looks as a good solution for this...

Comment 11 Markus Keller

2007-08-14 09:22:13 EDT

(In reply to comment #10)
That's also fine with me -- and it's also easier to explain in Help:
CamelCase
- <explain NPE and NuPoEx>
- terminating "<" or " " (space) to fix the number of camel-case parts and prevent automatic prefix matching for the last part (if it consists of multiple letters), e.g. "HMap<" and "HaMap<" match "HashMap" but not "HashMapEntry" nor "HatMapper".

Comment 12 Frederic Fusier

2007-08-16 11:59:42 EDT

(In reply to comment #10)
> You're right. The additional R_FULL_MATCH looks as a good solution for this...
> 
Unfortunately, I forgot that R_FULL_MATCH was set by default to distinguish from R_EQUIVALENT_MATCH and R_ERASURE_MATCH...
So, I'm afraid we cannot use this constant for this and need to add a new one...

Comment 13 Frederic Fusier

2007-08-17 07:30:05 EDT

Created attachment 76291 [details]
Proposed patch

This patch passes both JDT/Core and JDT/UI tests

Comment 14 Frederic Fusier

2007-08-17 07:31:43 EDT

Jerome, may have you a quick look on these API changes and let me know if this would be OK for you?
TIA

Comment 15 Jerome Lanneluc

2007-08-17 10:32:46 EDT

Created attachment 76305 [details]
Edited javadoc comment on CharOperation

Comment 16 Markus Keller

2007-08-17 10:56:06 EDT

You should also update SearchPattern#validateMatchRule(..). It still has...
(In reply to comment #9)
> Issue: The Javadoc
> of SearchPattern#R_CAMELCASE_MATCH already talks about combining this with
> SearchPattern#R_PREFIX_MATCH, but on the contrary,
> SearchPattern#validateMatchRule(..) says that camel case implies prefix match.

Comment 17 Frederic Fusier

2007-08-17 12:23:39 EDT

(In reply to comment #15)
> Created an attachment (id=76305) [details]
> Edited javadoc comment on CharOperation
> 
Thansk :-)

(In reply to comment #16)
> You should also update SearchPattern#validateMatchRule(..). It still has...
> (In reply to comment #9)
> > Issue: The Javadoc
> > of SearchPattern#R_CAMELCASE_MATCH already talks about combining this with
> > SearchPattern#R_PREFIX_MATCH, but on the contrary,
> > SearchPattern#validateMatchRule(..) says that camel case implies prefix match. 
> 
Thanks for the review, however, SearchPattern#validateMatchRule(..) says that camel case implies prefix *and* case sensitive. So, combine camel and prefix still has sense to found insensitive prefix match.

So, I rewrite the corresponding paragraph in R_CAMELCASE_MATCH constant javadoc comment as follow:
* Camel Case match rule already behaves both as prefix match and case sensitive 
* match rules (eg. <code>HashMap</code> pattern using using Camel Case rule 
* will match both <code>'HashMap'</code> and <code>HashMapEntry'</code>).
* However, this rule still can be combined to prefix match rule to also behaves 
* as prefix match and case insensitive match rules.
* For instance, when {@link #R_CAMELCASE_MATCH} | {@link #R_PREFIX_MATCH} 
* flags are combined, <code>"NULL"</code> pattern will still match 
* <code>NullPointerException</code>.

Comment 18 Frederic Fusier

2007-08-17 13:03:25 EDT

Created attachment 76317 [details]
New proposed patch

This patch includes all reviewers remarks (thanks again!) and also fixes bug 200400.

Comment 19 Frederic Fusier

2007-08-23 05:38:26 EDT

Created attachment 76748 [details]
Final proposed patch

This patch also includes fix for bug 200400.

Note that new constant name has changed since last posted patch, it's now:
SearchPattern#R_CAMEL_CASE_MATCH

One may argue that name is not really different than old constant (only '_' added between 'CAMEL' and 'CASE'), but as I wrote on bug 200400 comment 2, the old constant is depreciated and so avoid any confusion for users...

Comment 20 Frederic Fusier

2007-08-24 01:55:40 EDT

Created attachment 76871 [details]
Patch to be released

Patch to be released fixing last problem while validating match rule + some typos (thanks again Markus for your time to review and test the changes :-))

Comment 21 Frederic Fusier

2007-08-24 02:01:16 EDT

Patch released for 3.4M2 in HEAD stream.

Comment 22 Philipe Mulet

2007-08-28 09:18:17 EDT

I am not a big fan of the end result for the API. The new constant looks to similar to the old one, and has a different behavior. This is confusing.
Also the 'prefixMatch' notion on CharOperation#camelCaseMatch(...) is quite vague, it could be interpreted to mean that 'PE' matches NullPointerException when prefixMatch is off.
Suggesting renaming parameter into 'ignoreTrailingCharacters' or sthg like it.

Comment 23 Frederic Fusier

2007-08-29 09:23:57 EDT

(In reply to comment #22)
> I am not a big fan of the end result for the API. The new constant looks to
> similar to the old one, and has a different behavior. This is confusing.
>
I've opened bug 201426 to address this problem...

> Also the 'prefixMatch' notion on CharOperation#camelCaseMatch(...) is quite
> vague, it could be interpreted to mean that 'PE' matches NullPointerException
> when prefixMatch is off.
> Suggesting renaming parameter into 'ignoreTrailingCharacters' or sthg like it.
> 
...and I'll use the same bug to also fix this.

Comment 24 Frederic Fusier

2007-09-06 05:16:29 EDT

*** Bug 202220 has been marked as a duplicate of this bug. ***

Comment 25 Eric Jodet

2007-09-18 00:47:06 EDT

(In reply to comment #24)
Verified for 3.4 M2 using build I20070917-0010.