Bug 130051 - [Help] Search dynamic content properly
Summary: [Help] Search dynamic content properly
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: User Assistance (show other bugs)
Version: 3.2   Edit
Hardware: All All
: P1 major (vote)
Target Milestone: 3.2 M6   Edit
Assignee: Curtis d'Entremont CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-03-01 19:05 EST by Curtis d'Entremont CLA
Modified: 2021-09-22 05:46 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Curtis d'Entremont CLA 2006-03-01 19:05:37 EST
Now that we have dynamic content, searching needs to be updated. We cannot index filtered content because we may miss hits later on when the filter properties change, and because the environment where the docs were indexed can be different from the machine it is running on. For those interested and for future reference, here is the approach taken:

For indexing:

- Index the documents unfiltered, but with all content extensions, topic replacements, and includes resolved. This immediately rules out false negatives (hits that we should get but we don't) because we have all the content there and potentially more. However we can get false positives (e.g. if I have a linux only section in my doc that I'm viewing on windows).
- While indexing, keep track of all the filters a document is sensitive to. For single value filters like os, ws, it simply stores the name. For multi-value filters like plugin where there can be many, we store the full name and value (to denote: this doc is sensitive to whether plugin X is present or not)

For searching:

- Do an initial search pass on the master index. For each hit, check what filters the doc is sensitive to. If it has any filters, it is potentially a false hit and we flag it as such. Hits that don't have filters are definite hits and we're done with these.
- If we found potential false hits, we perform a second search pass for the potential false hits only, but on the cache index. This index has documents indexed with all content resolved and filters turned ON, unlike the master. Initially it will be empty - it is built up as needed. For each hit we check what the filterable property values were at the time we indexed this doc. If they match, the cache is up to date and it's a real hit. If it doesn't match or one of the potential false hit docs wasn't indexed, it is flagged as needing reindexing. So the whole purpose of the second pass is to determine whether the cache is up to date, and if not which parts are outdated.
- For each doc needing to be reindexed, reindex it in the cache, filtered this time. In addition, for the filters this document is sensitive to, store the actual values of the filter properties. e.g. if the document is sensitive to os and we're running in linux, store os=linux. This will be used in the future to see whether the doc is out of sync or not.
- Perform a third and final pass on the updated cache again. This time anything we find is a definite hit and we add it to the definite hits bucket.
- If we had to do a second and third pass, sort the final results by score.


The end user implications are that searching dynamic content may be slow on the initial searches where docs with dynamic content contain the search word(s). This is because it has to reindex these docs and resolve all extensions and filter the DOM. However subsequent searches should be fast, as it will use the cache. If you change your activity enablement and you have docs that filter by activity, docs may need to be reindexed here as well.

This doesn't affect infocenter because it doesn't filter, so the second and third passes are omitted in this case and there is no need for a cache index.
Comment 1 Curtis d'Entremont CLA 2006-03-01 19:09:54 EST
Fixed.
Comment 2 Curtis d'Entremont CLA 2006-03-02 11:17:41 EST
As per discussion with Dejan, need make a few modifications:

The master index should also be tagged with current filter values as opposed to just the filters the document is sensitive to. This way we can avoid in some cases reindexing altogether, even if we have filters. At build time, we need to find the values of the target platform.
Comment 3 Curtis d'Entremont CLA 2006-03-07 16:37:04 EST
Also, need to give the user the option to show or filter potential hits. Potential hits may arise when search found a match in a section of a document that may be filtered out. Default to true for performance reasons, but allow the user to change it from the UI and allow products to change the default via an API preference.
Comment 4 Curtis d'Entremont CLA 2006-03-07 17:29:03 EST
Fixed.