Community
Participate
Working Groups
Solr has an auto-completion framework called "suggesters", that performs completion based on the words in the index and in its available dictionaries. We should enable this to get rudimentary auto-completion working in the client. John ArthorneNov 18, 2010 11:49 A.M. Boris, unfortunately Suggester is only available in Slr 3.x and 4.x. We currently have Solr 1.x, which is the only version of Solr in Orbit. So this will likely take longer to get going. I have no ideas what dependencies Solr 3.x brings in. Bumping to M5 for now but maybe it will still fit in. 2.John ArthorneNov 18, 2010 12:00 P.M. And unfortunately Solr 3.1 has not released yet, and I have found no evidence of a concrete release date.
Solr 3.1 is now available so it is now possible to explore this (and 3.3 for that matter).
Even 3.4 is available now. I will explore this.
As a first step, I am making Orion work with Solr 3.5, the latest official release. Fortunately, it's mostly backward compatible. Might have to replace a bunch of deprecated classes with newer ones, though.
(In reply to comment #3) > As a first step, I am making Orion work with Solr 3.5, the latest official > release. Fortunately, it's mostly backward compatible. Might have to replace a > bunch of deprecated classes with newer ones, though. That is fantastic! We absolutely need to move up to Solr 3.5 because of some severe file handle leaking problems with the current Solr/Lucene. Any work you can do to move to the new Solr would be greatly appreciated. That work is tracked by bug 370484.
I would like to take inputs for this new feature. Here are questions that I would like to be discussed before starting to work on this: 1. What do we want to suggest? Do we want to only cache what a user types and suggest only those terms later or should we also have a pre-built dictionary (I don't know what the dictionary might have, though) ? 2. While caching the users' search term, should we also cache the preceding and following words to make suggest more intelligent and useful? 3. Should the suggest be at a user level or should we have one index storage for all users? I guess most would go for the latter.
Here is what I would start with as requirements for auto-suggest: * The search terms used by users are indexed/stored, which will be used by the suggester. Search terms that return zero results will not be considered. * Ideally we would want a job that does the indexing part. The job could read the logs (we will have to log minimal search data) and update the index at regular intervals. But initially, we could just update the index after responding to the search request. * Looks like it's not going to be easy choosing between per-user cache and global cache. For now, we will make it global/shared index, which I feel is more useful.
Here is how I thought this would work: - We already have an index on the server of all words found in all files. - The suggester would simply return entries found in the index matching the requested prefix. - To make the suggestions more appropriate and to avoid leaking private data we should scope the search to the individual user, like we do in other searches (see SearchServlet.java line 85)
(In reply to comment #7) > Here is how I thought this would work: > > - We already have an index on the server of all words found in all files. > - The suggester would simply return entries found in the index matching the > requested prefix. I never thought about that. But I like the idea. Initially when I saw your comment, I thought, 'well, that will slow things down'. But looks like Solr is scaling well and it is working when I map the suggester to the file content field (indexed). Only problem seem to be with the braces and such. For some reason, function(test) becomes "functioneventname" when being suggested, which is really useless. I will continue to explore. But basic things appear to be working. > - To make the suggestions more appropriate and to avoid leaking private data we > should scope the search to the individual user, like we do in other searches > (see SearchServlet.java line 85) If we must protect privacy, then I agree with that.
(In reply to comment #7) > - To make the suggestions more appropriate and to avoid leaking private data we > should scope the search to the individual user, like we do in other searches > (see SearchServlet.java line 85) Looks like there is no direct way of filtering the suggestions based on the user. Apparently the way suggester (and spellcheck) component works is, it goes over the main index and creates a separate index for suggestions. While it does, it only uses the specified field to create the index. Once the spellcheck index is created, there is no way for us to filter it further based on other fields. I will continue to investigate.
After spending a lot of time going through forums and asking questions, I have gathered 3 possible alternate ways we could achieve this for multiple users. 1. Using multi cores, one per user. So, the search and thus suggest would automatically fetch only the records that belong to this user. The negatives are, data duplication since more than one user could access same set of files and if the number of users is high, then it becomes a problem to maintain. Also, I can't imagine an admin having to create one core for every single user. I guess this simply won't work, practically. 2. Use multiple dynamic fields, one for each user. But just like the first, there would be lot of duplication. But administration would be lot easier. 3. Using faceting instead of suggester component. This seem to allow passing user names just as part of the query and also add more filters if we want. But performance and scalability seem to be on the lower side comparing with the suggester. I don't like the first one at all. The latter two are better but both having drawbacks. It would be nice to get inputs from others as well. John, what do you think?
We are no longer using Apache Solr on the Orion server for search.