Bug 17030 - Need to search documentation for exact string match
Summary: Need to search documentation for exact string match
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: User Assistance (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows 2000
: P3 enhancement (vote)
Target Milestone: 2.0.2   Edit
Assignee: Konrad Kolosowski CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 20802 24801 (view as bug list)
Depends on:
Blocks:
 
Reported: 2002-05-22 15:55 EDT by Curtis d'Entremont CLA
Modified: 2002-10-15 14:08 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Curtis d'Entremont CLA 2002-05-22 15:55:47 EDT
Using Eclipse 2.0, Build F1 (buildid: 20020521) 
on Windows 2000.

From Eclipse, go to menu 'Help'->'Help Contents' and do a search for "Eclipse". 
Instead of finding all instances of "Eclipse" it will search for (and 
highlight) "Eclips". But this is not an off-by-one error, if you search 
for "something", it instead looks for "someth".

By searching for "used" you can see that it is indeed searching incorrectly, 
and not just highlighting incorrectly. It will search for "us" and will match 
all the "Using ..." pages.
Comment 1 Konrad Kolosowski CLA 2002-05-23 01:54:06 EDT
Original Summary: Help window does incorrect searching and highlighting

The behavior you are seeing is caused by stemming being enabled in help 
search.  It is not fuzzy search that would allow any arbitrary letter (or more) 
to differ.

For searching documentation it is desired for search engine to perform 
stemming.  It causes search for
 using eclipse
to find
 using eclipse
 how eclipse can be used
 how to use eclipse
which is an improvement over finding just
 using eclipse.

There is no perfect stemming algorithms, and one we use will extract root of 
words from Eclipse and something as eclips and someth, but this does not 
negatively affect effectiveness of search in most cases.

I believe what you are looking for is exact string match.  Help system defines 
extension point for plugging in language analyzer.  By default, on English 
locale, help uses analyzer that performs stemming, but this can be easily 
changed by plugging in a simpler analyzer, that breaks text into words without 
stemming or stop words removal.

What help does not currently support is both search with stemming enabled and 
search for exact match at the same time.  With stemming being enabled, the 
quoted query means a search for a phrase consisting of consecutive words but 
not an exact string match.

I will change this bug to a request for this enhancement, and defer it for 
later.  Implementing this requirement will probably require second index to be 
created in parallel, and there are performance issues associated with this 
approach.
Comment 2 Konrad Kolosowski CLA 2002-06-25 15:28:08 EDT
*** Bug 20802 has been marked as a duplicate of this bug. ***
Comment 3 Dorian Birsan CLA 2002-07-08 14:44:55 EDT
Re-opening the bug...
Comment 4 Konrad Kolosowski CLA 2002-08-07 14:48:41 EDT
Targeted for 2.1.
Comment 5 Konrad Kolosowski CLA 2002-09-16 16:17:14 EDT
We will add another fields in the index to keep not stemmed version of ducument 
words.  This fields will going to be searched when keywords are double quoted 
in the search query.
Comment 6 Konrad Kolosowski CLA 2002-09-17 12:37:36 EDT
Imlemented and released in head.
Comment 7 Jeanette Deupree CLA 2002-10-02 16:43:25 EDT
I tested this and it works great! Any way to get it into 2.0.2? I think it 
would be a great improvement for any users of the help system. 
Comment 8 Konrad Kolosowski CLA 2002-10-03 15:08:13 EDT
We have received good feedback regarding this feature, and no regression has 
been reported.  Since it is a great improvement, I have merged the code into 
2.0.2 builds as well.  20021009 2.0.2 build will pick up the changes.
Comment 9 Konrad Kolosowski CLA 2002-10-15 14:08:44 EDT
*** Bug 24801 has been marked as a duplicate of this bug. ***