Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [recommenders-dev] SnipMatch search algorithm solution

Hi Chen - see below!

On Tue, Jul 3, 2012 at 5:38 AM, Chen Cheng <chengchendoc@xxxxxxxxx> wrote:
> Hi Doug,
>
> As we discussed before, search algorithm should work like this:
>
> SnipMatch search algorithm:
> 1) First, we perform a keyword search, looking for each of the words
> in the search query within the snippet _search patterns_ (defined by
> the snippet creators - e.g. "lowercase $str"). If there is at least
> one overlapping word, we consider a match to have been found.
> 2) Then, we rank the results and display the top ten. There are two
> cases for ranking:
>
> a) The search pattern (defined by the snippet creator - e.g.
> "lowercase $str") is _in-order_ with respect to the search query (e.g.
> the search query might be "lowercase a" or "low", etc.).
> b) The search pattern is _not_ in order with respect to the search query
>
> All in-order results are ranked higher than unordered results.
> Finally, results are also ordered within these cases:
>
> In-order ranking:
> -results are ranked by the number of words in the search query
> matching words in the search pattern. when there is a tie, we rank
> results with fewer missing arguments (variables) higher.
> Unordered ranking:
> -results are only ranked by the number of words in the search query
> matching words in the search pattern.
>
> I have several questions:
> 1. Search algorithm is just related with "search pattern" items in JSON
> format snip file, has no relationship with "summay" item, right?

Yes, that's correct.

>
> 2. About the _in-order_ rank case, suppose we have a search pattern like
> this:
> "patterns": [
>     "create Swing button in $parent"
>   ]
> And user input search query "swing in panel", is it _in-order_ ? I mean did
> you have a detailed algorithm to decide whether search query is _in-order_
> with search pattern or not.

No, it must be exactly in order. I think I wrote out pseudo-code for
this in one of our gchats: tokenize on whitespace (the search query
and also the search pattern), then compare - if an argument in a
search pattern, it can match any token, otherwise, must be an exact
string match (for now at least - implement this first - there are a
few tweaks to the algorithm that build on this that we should consider
after it is done).

>
> 3. In the example of point 2, let's step forward
> If it is _in-order_, what is the *number of words* in the search query
> matching words in the search pattern?
> We can say "swing" is in, and "in" is also in the search pattern. But how to
> deal with the string "panel"? I mean there are two cases:

It isn't inorder, so instead consider the query: "create Swing", which
is in order. The word count here is 2.

> A. It is matched with "$parent" in search pattern, we can say "panel" is
> also _in-order_

Here's an example where something ("x") matches with parent: "create
Swing button in x"

> B. This is just an excess string, it is meaningless. In this situation,
> "panel" is not ranked as _in-order_
> I think it is hard to decided which case it fit, i mean it is hard to fit
> string word to parameters in search pattern.

Well, it would be hard if this were considered inorder, but it isn't.
Hopefully this email clarifies, but don't hesitate to ask again if it
doesn't!

>
> --
> Best Regards From Cheng Chen [chengchendoc@xxxxxxxxx]


Back to the top