Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[recommenders-dev] SnipMatch search algorithm solution

Hi Doug,

As we discussed before, search algorithm should work like this:

SnipMatch search algorithm:
1) First, we perform a keyword search, looking for each of the words
in the search query within the snippet _search patterns_ (defined by
the snippet creators - e.g. "lowercase $str"). If there is at least
one overlapping word, we consider a match to have been found.
2) Then, we rank the results and display the top ten. There are two
cases for ranking:

a) The search pattern (defined by the snippet creator - e.g.
"lowercase $str") is _in-order_ with respect to the search query (e.g.
the search query might be "lowercase a" or "low", etc.).
b) The search pattern is _not_ in order with respect to the search query

All in-order results are ranked higher than unordered results.
Finally, results are also ordered within these cases:

In-order ranking:
-results are ranked by the number of words in the search query
matching words in the search pattern. when there is a tie, we rank
results with fewer missing arguments (variables) higher.
Unordered ranking:
-results are only ranked by the number of words in the search query
matching words in the search pattern.

I have several questions:
1. Search algorithm is just related with "search pattern" items in JSON format snip file, has no relationship with "summay" item, right?

2. About the _in-order_ rank case, suppose we have a search pattern like this:
"patterns": [
    "create Swing button in $parent"
  ]
And user input search query "swing in panel", is it _in-order_ ? I mean did you have a detailed algorithm to decide whether search query is _in-order_ with search pattern or not.

3. In the example of point 2, let's step forward
If it is _in-order_, what is the *number of words* in the search query matching words in the search pattern?
We can say "swing" is in, and "in" is also in the search pattern. But how to deal with the string "panel"? I mean there are two cases:
A. It is matched with "$parent" in search pattern, we can say "panel" is also _in-order_
B. This is just an excess string, it is meaningless. In this situation, "panel" is not ranked as _in-order_
I think it is hard to decided which case it fit, i mean it is hard to fit string word to parameters in search pattern.

--
Best Regards From Cheng Chen [chengchendoc@xxxxxxxxx]

Back to the top