Bug 334711 - [search] Enable solr auto-completion
Summary: [search] Enable solr auto-completion
Status: RESOLVED WONTFIX
Alias: None
Product: Orion (Archived)
Classification: ECD
Component: Server (show other bugs)
Version: 0.2   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
Depends on: 370484
Blocks:
  Show dependency tree
 
Reported: 2011-01-18 16:16 EST by John Arthorne CLA
Modified: 2015-01-19 15:12 EST (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2011-01-18 16:16:09 EST
Solr has an auto-completion framework called "suggesters", that performs completion based on the words in the index and in its available dictionaries. We should enable this to get rudimentary auto-completion working in the client.

John ArthorneNov 18, 2010 11:49 A.M.
Boris, unfortunately Suggester is only available in Slr 3.x and 4.x. We currently have Solr 1.x, which is the only version of Solr in Orbit. So this will likely take longer to get going. I have no ideas what dependencies Solr 3.x brings in. Bumping to M5 for now but maybe it will still fit in.

2.John ArthorneNov 18, 2010 12:00 P.M.
And unfortunately Solr 3.1 has not released yet, and I have found no evidence of a concrete release date.
Comment 1 John Arthorne CLA 2011-08-31 16:46:42 EDT
Solr 3.1 is now available so it is now possible to explore this (and 3.3 for that matter).
Comment 2 Jay Arthanareeswaran CLA 2012-03-07 02:08:13 EST
Even 3.4 is available now. I will explore this.
Comment 3 Jay Arthanareeswaran CLA 2012-03-08 09:53:22 EST
As a first step, I am making Orion work with Solr 3.5, the latest official release. Fortunately, it's mostly backward compatible. Might have to replace a bunch of deprecated classes with newer ones, though.
Comment 4 John Arthorne CLA 2012-03-08 11:36:32 EST
(In reply to comment #3)
> As a first step, I am making Orion work with Solr 3.5, the latest official
> release. Fortunately, it's mostly backward compatible. Might have to replace a
> bunch of deprecated classes with newer ones, though.

That is fantastic! We absolutely need to move up to Solr 3.5 because of some severe file handle leaking problems with the current Solr/Lucene. Any work you can do to move to the new Solr would be greatly appreciated. That work is tracked by bug 370484.
Comment 5 Jay Arthanareeswaran CLA 2012-03-26 05:28:40 EDT
I would like to take inputs for this new feature. Here are questions that I would like to be discussed before starting to work on this:

1. What do we want to suggest? Do we want to only cache what a user types and suggest only those terms later or should we also have a pre-built dictionary (I don't know what the dictionary might have, though) ? 

2. While caching the users' search term, should we also cache the preceding and following words to make suggest more intelligent and useful?

3. Should the suggest be at a user level or should we have one index storage for all users? I guess most would go for the latter.
Comment 6 Jay Arthanareeswaran CLA 2012-04-02 06:04:08 EDT
Here is what I would start with as requirements for auto-suggest:

* The search terms used by users are indexed/stored, which will be used by the suggester. Search terms that return zero results will not be considered.

* Ideally we would want a job that does the indexing part. The job could read the logs (we will have to log minimal search data) and update the index at regular intervals. But initially, we could just update the index after responding to the search request.

* Looks like it's not going to be easy choosing between per-user cache and global cache. For now, we will make it global/shared index, which I feel is more useful.
Comment 7 John Arthorne CLA 2012-04-02 08:45:08 EDT
Here is how I thought this would work:

- We already have an index on the server of all words found in all files.
- The suggester would simply return entries found in the index matching the requested prefix.
- To make the suggestions more appropriate and to avoid leaking private data we should scope the search to the individual user, like we do in other searches (see SearchServlet.java line 85)
Comment 8 Jay Arthanareeswaran CLA 2012-04-02 10:21:32 EDT
(In reply to comment #7)
> Here is how I thought this would work:
> 
> - We already have an index on the server of all words found in all files.
> - The suggester would simply return entries found in the index matching the
> requested prefix.

I never thought about that. But I like the idea. Initially when I saw your comment, I thought, 'well, that will slow things down'. But looks like Solr is scaling well and it is working when I map the suggester to the file content field (indexed). Only problem seem to be with the braces and such. For some reason, function(test) becomes "functioneventname" when being suggested, which is really useless. I will continue to explore. But basic things appear to be working.

> - To make the suggestions more appropriate and to avoid leaking private data we
> should scope the search to the individual user, like we do in other searches
> (see SearchServlet.java line 85)

If we must protect privacy, then I agree with that.
Comment 9 Jay Arthanareeswaran CLA 2012-04-05 06:43:00 EDT
(In reply to comment #7)
> - To make the suggestions more appropriate and to avoid leaking private data we
> should scope the search to the individual user, like we do in other searches
> (see SearchServlet.java line 85)

Looks like there is no direct way of filtering the suggestions based on the user. Apparently the way suggester (and spellcheck) component works is, it goes over the main index and creates a separate index for suggestions. While it does, it only uses the specified field to create the index. Once the spellcheck index is created, there is no way for us to filter it further based on other fields. I will continue to investigate.
Comment 10 Jay Arthanareeswaran CLA 2012-04-25 01:09:12 EDT
After spending a lot of time going through forums and asking questions, I have gathered 3 possible alternate ways we could achieve this for multiple users.

1. Using multi cores, one per user. So, the search and thus suggest would automatically fetch only the records that belong to this user. The negatives are, data duplication since more than one user could access same set of files and if the number of users is high, then it becomes a problem to maintain. Also, I can't imagine an admin having to create one core for every single user. I guess this simply won't work, practically.

2. Use multiple dynamic fields, one for each user. But just like the first, there would be lot of duplication. But administration would be lot easier.

3. Using faceting instead of suggester component. This seem to allow passing user names just as part of the query and also add more filters if we want. But performance and scalability seem to be on the lower side comparing with the suggester.

I don't like the first one at all. The latter two are better but both having drawbacks. It would be nice to get inputs from others as well. John, what do you think?
Comment 11 Anthony Hunter CLA 2015-01-19 15:12:23 EST
We are no longer using Apache Solr on the Orion server for search.