Bug 435469 - Implement stopwords strategy
Summary: Implement stopwords strategy
Status: NEW
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Recommenders (show other bugs)
Version: unspecified   Edit
Hardware: PC Mac OS X
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-05-22 02:25 EDT by Johannes Dorn CLA
Modified: 2019-07-24 14:36 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Johannes Dorn CLA 2014-05-22 02:25:16 EDT
We may want to consider using stopwords in Snipmatch.
While "if" should certainly not be a stopword, others like "a", "are", etc. are probably not useful for the search.

We would need a handcrafted list of stopwords. Alternatively, we could filter the standard set of stopwords for java keywords.

There are two places where we can use them: During indexing and for queries.
Indexing stopwords prevent those words to be added to the index.
Query stopwords ignore those words during the search.


There are a couple of issues for our use case, which argue both for- and against using stopwords for both queries and indexing.

Snippet 1: arrayadd
Snippet 2: create a button
Snippet 3: something else
Stopwords: "a", "the"

Example 1:
Query: a

Result when not using stopwords: Snippet1, Snippet 2 - not ideal
Result when using stopwords for indexing and queries: nothing - bad - this is due to "a" being filtered, leaving the query empty.
Result when using stopwords for indexing only: Snippet 1 - perfect

Example 2:
Query: a button
Result when not using stopwords: Snippet 2 - perfect
Result when using stopwords for indexing and queries: Snippet 2 - perfect
Result when using stopwords for indexing only: nothing - bad - due to the AND connection, the query insists on the word/prefix "a", but the index doesn't contain the word or prefix "a".
Comment 1 Andreas Sewe CLA 2015-04-29 08:22:23 EDT
Classifying as enhancement request.