Community
Participate
Working Groups
I've been plagued by the fact that Eclipse's CamelCase is not (IMHO) as powerful as IntelliJ's. I created a patch to CharOperation.java to both simplify and make more powerful the algo. I've also added a test that is very indicative of what I think should work... I realize there is a split between the algo used for auto-completion (this) and another (which I have not found yet) for "Open Type". If the algo I attach to this bug is deemed good, I'd urge you to help me out to make it the one and only used by anything that requires CamelCase. Z
Created attachment 35703 [details] Patch to re-implement CamelCase
We will consider it, thanks for the input. The test is very helpful too, since it sets the expectation. The current implementation mostly meets the basics, if community agrees we can make it more powerful; but we need to keep the rules simple. Just to clarify things, can you detail the rules you implemented ?
CamelCase as implemented is not CamelCase at all, like IntelliJ's... It only accepts capital letters in the beginning, and lowercase in the end. So, while HMap matches HashMap, HaM does not. Easiest way is for me to list all tests that I wrote that will fail under the current algo. "." for the ones that will pass, F is for the ones that will fail, ? I'm not quite sure . {"TZ","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ F {"TiZ","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ F {"TiZon","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ . {"TZon","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ . {"TZone","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ . {"TimeZone","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ . {"TimeZ","TimeZ"}, //$NON-NLS-1$//$NON-NLS-2$ . {"TZ","TimeZ"}, //$NON-NLS-1$//$NON-NLS-2$ . {"T","TimeZ"}, //$NON-NLS-1$//$NON-NLS-2$ . {"T","TimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ . {"TZ","TZ"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aT","aTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aTi","aTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aTiZ","aTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aTZ","aTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aT","artTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aTi","artTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aTiZ","artTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$ ? {"aTZ","artTimeZone"}, //$NON-NLS-1$//$NON-NLS-2$
Well, strictly speaking there is no absolute definition of CamelCase matching. So we have flexibility here. Normally, matching on uppercase letters is the base for CamelCase matching; which is what we implemented.
Sorry: I did not mean to imply that there is an absolute right way of doing it, only that: 1) it could be much more useful 2) it is not any harder to do so 3) there is no downside (that I can see) 4) the patch is included The remark about "CamelCase" is that it seems to me the name comes from the fact that there are SevEraL humps in the query string, rather then the SIngle hump as currently implemented (DromedaryCase?) :^>
Ok, we are in agreement. Dirk - as emeritus CamelCase expert, do you have an opinion on this evolution ? On the same topic, some also wanted to have CamelCase bettern perform with digits...
Two things to remark: - if we change the algo then we have to adapt SearchPattern.validateMatchRule as well. - if we add the additional support then we have to think about the impact on sorting as well. Consider the following example: o user types TiZone and there is a type TiZone o since TiZone also matches TimeZone the sorting will result in: ... TimeZone TiZone ... resulting in the fact that an exact match doesn't appear first. If we sort TiZone first then the list will not be strictly alphabetical anymore. In general I like the idea of making CC match more powerful.
Hmmmm. Does alphabetical sorting do all Uppercases always before any lowercase? I.e., is "ZZ" less than "aa"? If not, the currenct impl is already not guaranteed to prefer exact matches. If you type 'TZ', and there's a class named TZ, it will show up after TimeZone (again, unless uppercases are all first). As another point, order is tweaked with time to give preference to what the user likes best. But finally, I agree that a perfect match should always be preferred -- maybe an explicit test for exact matches should always be performed.
Frederic - pls take care of it. We want to handle "NuPoEx"
I think Dirk concern about sorting is not relevant as OpenType dialog typically sort in alphabetical order and "TiZone" is before "TimeZone" ('Z' is lower than 'm' and all lowercase characters) => If user type "TiZone", then he'll get most relevant match on top of list. Moreover, we currently have the same behavior (ie. the patch does not change anything). If user type "TZ", he'll get "TZ" class before "TimeZone" one...
"Zorzella", I would like to have your real first and last names to add you as contributor in copyright comment list, thanks
Created attachment 38134 [details] Tuned patch to fit last changes done in CharOperation Note that I modified tests for uppercase or lowercase character as it must be splitted in two separated tests to avoid running into Character tests methods systematically. I also put your test in UtilTests class which was the initial class for this CharOperation tests... If anything hurts you, please shout...
Dirk, As you can see in patch, SearchPattern.camelCaseMatch(*) methods have been rewritten to call CharOperation method to avoid duplicated algorithms. If you think this can have a big impact on your performance results then let me know asap and I'll revert this...
Frederic, the reason why we added the String based methods in the first place was to avoid char creation for every string we compare. I currently use SearchPattern.camelCaseMatch for the history and the cached result. For example if you type A and the dialog received all matches before you type the next character then I search in memory in the cached result. In big workspaces this can be thousands of types. So the performance impact will be the additional garbage collects.
(In reply to comment #14) > Frederic, > > the reason why we added the String based methods in the first place was to > avoid char creation for every string we compare. > > I currently use SearchPattern.camelCaseMatch for the history and the cached > result. For example if you type A and the dialog received all matches before > you type the next character then I search in memory in the cached result. In > big workspaces this can be thousands of types. So the performance impact will > be the additional garbage collects. > OK, so I will put back to SearchPattern duplicated algorithm using String and charAt(int) method...
Frédéric, my full name is Luiz-Otávio Zorzella. And, yes, I do go by my last name, Zorzella. Thanks, Z
Created attachment 38178 [details] Final patch including fixes for some side effects + duplicate algortihm in SearchPatter Luiz, please let me know if this patch is OK for you. I'm currently running tests on it and so, I should be able to release it tomorrow early morning if you agree... Thanks
Nice... Could you list my email as "zorzella at gmail dot com" or is that against the norm? If so, leave as it is. Thanks for seeing this through. I can barelly wait to have power-CamelCase!!! Zorzella
Patch released in HEAD (I've changed your email as requested in copyright comment). This would be available in next nightly build (e.g. N20060412-0010) tomorrow morning. Thanks again for your contribution :-)
You should read "This _will_ be available..." in previous comment...
Verified for 3.2 RC1 using build I20060413-0010
[contributed patch applied]