Bug 377333 - Reduce noise in subwords completion
Summary: Reduce noise in subwords completion
Status: CLOSED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Recommenders (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Marcel Bruch CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-21 05:44 EDT by Deepak Azad CLA
Modified: 2019-07-24 14:37 EDT (History)
0 users

See Also:


Attachments
Now it feels like 'camel case matching ++' (20.39 KB, image/png)
2012-04-22 05:43 EDT, Marcel Bruch CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Deepak Azad CLA 2012-04-21 05:44:39 EDT
Let me reuse the screenshot from bug 376674

> Created attachment 214340 [details]
In my opinion everything below 'newTypeParameter' is noise and a user does not want those proposals when 'para*' is typed. Note the * at the end of 'para*', I am saying that the user can type more characters but would still not want any of the proposals coming towards the end.

I am assuming that when a human invokes content assist he has either typed
- a word or a sequence of words e.g. newtype (new+type)
- first letter(s) of a sequence of words e.g. ntp or newtp (new+Type+Parameter)
Would someone do it any other way?

With the above assumptions in mind I think the following proposals make no sense
- newTypeDeclaration
       ^     ^^^
- newPackageDeclaration
     ^^          ^^

Now camel case convention in java names provides clear word boundaries. I think we should discard any proposal where a sequence of matched characters does NOT begin at a word. Does this sound reasonable?
Comment 1 Marcel Bruch CLA 2012-04-21 08:39:47 EDT
I think treating matches on word boundaries in a special way is reasonable - although I'm not fully convinced that it's always/often unwanted noise. But most of the time we might use 'real' subwords as tokens.

However, I'd like to go one step further. You might use something like 'mgr' to find proposal that contain the word 'manager' or 'hw' for 'hardware'. We often use abbreviations or even tokens that do not syntactically match. Does this work well with your approach?

I stille like the idea how the "Abbreviation Completion" works. Although this may be a bit too radical... ;)


I think, we need some empirical data that supports our claims. We actually have given away a hands-on writing a 'proposal logger'  (it logs which proposals are actually selected with which prefix etc. and where and stores this data on some server).

I think this could help us much to gather the required information to support your (and maybe my) claims.
What do you think?
Comment 2 Deepak Azad CLA 2012-04-21 08:53:04 EDT
(In reply to comment #1)
> I think treating matches on word boundaries in a special way is reasonable -
> although I'm not fully convinced that it's always/often unwanted noise. But
> most of the time we might use 'real' subwords as tokens.
> 
> However, I'd like to go one step further. You might use something like 'mgr' to
> find proposal that contain the word 'manager' or 'hw' for 'hardware'. We often
> use abbreviations or even tokens that do not syntactically match. 
Point taken.

> Does this work well with your approach?
I think it could.

Let's say there is a method 'getHardwareManager()' and let's see what all should match this
- mgr
getHardwareManager
           ^   ^ ^
=> first letter of the word is included in the match - OK

- hw
getHardwareManager
   ^   ^
=> first letter of the word is included in the match - OK

- age
getHardwareManager
              ^^^
=> first letter of a word is NOT included in the match - Discard

Essentially, if some characters in a word are matched but not the first one we should discard the proposal. Even with this rule all the unwanted proposals from screenshot in comment 0 are discarded.
Comment 3 Marcel Bruch CLA 2012-04-22 04:30:46 EDT
(In reply to comment #2)
> Essentially, if some characters in a word are matched but not the first one we
> should discard the proposal. Even with this rule all the unwanted proposals
> from screenshot in comment 0 are discarded.

Seems worth trying. The simplest heuristic(!) that comes close to what you want, is to add a filter that checks that at least one upper case letter is in the match. 

This is would be a fast and experimental (!) hack to see how this filtering works out in practice. If this is actually what we want, we can think about developing a clean approach.

If you have more of these ideas, you may consider writing patches ;)
Comment 4 Deepak Azad CLA 2012-04-22 04:45:40 EDT
(In reply to comment #3)
> If you have more of these ideas, you may consider writing patches ;)
Yes, I already considered that option :-)

I will keep opening bugs, and if they do not get fixed I will try to take a look at them myself. But I do not see that happening at least for the next month or so.
Comment 5 Marcel Bruch CLA 2012-04-22 04:53:50 EDT
(In reply to comment #4)
> (In reply to comment #3)
> > If you have more of these ideas, you may consider writing patches ;)
> Yes, I already considered that option :-)

Great!

> I will keep opening bugs, and if they do not get fixed I will try to take a
> look at them myself. But I do not see that happening at least for the next
> month or so.

Sounds good. I've to focus on the other tools for Juno too. So the time I can spent on subwords will be limited. My goal is to make it a useful addition for Juno but not feature-complete, i.e., fix every severe issue but postpone some ideas after Juno.

If you or someone else provides patches, however, I'd be glad to  integrate them. But for know: Just keep ideas coming ;)
Comment 6 Marcel Bruch CLA 2012-04-22 05:43:12 EDT
Created attachment 214345 [details]
Now it feels like 'camel case matching ++'

Is this the intended goal? A better (read, more permissive) camel case matching? Is there a 'real' delta to camel case matching except that you can write more letters and can start with fragments of the camel case (as depicted in the screenshot)?
Comment 7 Deepak Azad CLA 2012-04-22 05:50:19 EDT
(In reply to comment #6)
> Created attachment 214345 [details]
> Now it feels like 'camel case matching ++'
> 
> Is this the intended goal? A better (read, more permissive) camel case
> matching? Is there a 'real' delta to camel case matching except that you can
> write more letters and can start with fragments of the camel case (as depicted
> in the screenshot)?

There is a real delta with respect to camel case (in addition to what you already mentioned)
- You do not have to type in upper case
- You do not have to type first letters of all the words

Calling it 'camel case matching ++' may not be best. It is really Subword completion which accounts for words in a java name.
Comment 8 Marcel Bruch CLA 2012-04-22 06:49:27 EDT
you planned that long beforehand, right ? ;)  I'm fine with the current implementation. Let's give it a try and see how people like it.
Comment 9 Deepak Azad CLA 2012-04-22 09:20:38 EDT
(In reply to comment #8)
> I'm fine with the current
> implementation. Let's give it a try and see how people like it.

Ok, let me know when you release this.
Comment 10 Marcel Bruch CLA 2012-04-22 12:33:20 EDT
http://download.eclipse.org/recommenders/updates/head/e42 now contains the binaries.
Comment 11 Deepak Azad CLA 2012-04-22 14:42:59 EDT
Looks good so far!

(In reply to comment #3)
> Seems worth trying. The simplest heuristic(!) that comes close to what you
> want, is to add a filter that checks that at least one upper case letter is in
> the match. 
> 
> This is would be a fast and experimental (!) hack to see how this filtering
> works out in practice. If this is actually what we want, we can think about
> developing a clean approach.
Just to confirm, you implemented the full solution, right?
Comment 12 Marcel Bruch CLA 2012-04-22 15:56:26 EDT
(In reply to comment #11)
> Just to confirm, you implemented the full solution, right?

There are one or two cases that may be unexpected and I'm just relying on a very small test suite at the moment. But yes, it's more than the heuristic mentioned above.
Comment 13 Marcel Bruch CLA 2012-06-09 15:11:55 EDT
Set target milestone for fixed bugs to 1.0
Comment 14 Marcel Bruch CLA 2012-06-09 15:12:12 EDT
Set target milestone for fixed bugs to 1.0
Comment 15 Marcel Bruch CLA 2012-07-01 05:39:28 EDT
Juno shipped, change was made. Closing this bug.