Summary: | Aggregated plugin search indexes | ||
---|---|---|---|
Product: | [Eclipse Project] Platform | Reporter: | Paul Hardiman <paulhard> |
Component: | User Assistance | Assignee: | Dejan Glozic <dejan> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | enhancement | ||
Priority: | P1 | CC: | dejan, mbeltzner, mfaraj, wassim.melhem |
Version: | 3.0 | Keywords: | performance |
Target Milestone: | 3.1 M7 | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: |
Description
Paul Hardiman
2004-04-05 10:28:32 EDT
Paul, Do you mean pre-built text search index should be contributed in multiple pieces instead of one per product, or are you concerned about keyword index or alphabetical index? Reply from Paul: Hi, Thanks for responding. I mean that each doc plugin should have the option to include a pre-built, Eclipse Help compatible search index. When Eclipse Help builds the master index for each doc plugin, it will either use the pre-existing indexes in that plugin or it will generate the indexes on the fly. The builder of the doc plugin takes responsibility for the currentness of the index if it exists. This may mean that Eclipse provides a utility to build search indexes for a doc plugin. Perhaps that entity could be referenced in the plugin.xml file. Yes, We have consider this alternative when developing support for pre-built index, but we did not have time to develop both. Right now there index can be contributed per product, what you suggest has certain advantages (if product is built incrementally on top of other products, of plug-ins are updated frequently). This would adequately resolve the problem stated in bug 77328, as well. That is to say that the plugin providers could ship with a pre-built index which would drastically reduce the initial indexing time upon first search. Please note that in terms of user expectations, this first indexing process should take no longer than 15-20s (assuming all the plugins have pre-built indicies) The first part of the implementation finished as follows: 1) An element 'index' is added to the org.eclipse.help.toc extension: <index path="relative_path_to_index_directory"/> This element is an indication for help that the plug-in has prebuilt index and also a pointer to the location of the index starting from the plug- in/fragment directory. 2) An API class 'HelpIndexBuilder' is added to org.eclipse.help.base. This class parses the plug-in manifest provided as the input and creates index in the destination directory. 3) For fragments, one index is created for each locale directory found under nl/ directory. Since fragments typically only serve files for documentation plug-ins, HelpIndexBuilder must be given the manifest of the fragment plug-in, not the fragment itself. 4) PDE UI has an action under 'PDE Tools' registered for plugin.xml and fragment.xml called 'Create Help Index'. It will produce index as defined above. The help documents must not be zipped. 5) A custom Ant task has been defined in org.eclipse.help.base called 'help.createIndex' that accepts 'manifest' and 'destination' attributes. 6) SDK documentation plug-in build.xml files have been modified to hook the Ant task and build the search index as part of the regular build. Moving to Konrad for adding code in Help that takes advantage of prebuilt indexes if found. (In reply to comment #5) > 5) A custom Ant task has been defined in org.eclipse.help.base > called 'help.createIndex' that accepts 'manifest' and 'destination' > attributes. Correction: the name of the ant task is 'help.buildHelpIndex'. I released the first cut implementation of help indexing using prebuilt indexes. Coincidently :-) the index produced by the 'Create Help Index' action is compatible with what we are expecting. Thanks Dejan. Details like error handling, consistency, and progress reporting during index merging are still on the table being looked at, but most functionality is there so doc plugins can start providig their own index any time. If a plugin declares prebuilt index, help check for prebuilt indexes in root, nl, os, ws directories, of plug-in and fragments. Plug-ins are allowed to be jarred, index is then extracted before being merged. All these indexes are merged into the master index, and then duplicates are removed. This ensures that documents in indexes in nl directores take precendence over documents prebuilt into the root index, while allowing the nl to contain subset of all documents. When a plug-in with prebuilt index, or its fragments, contains translations, it must provide prebuilt index of translated files in the appropriate nl subdirectories in order to search non default language files. I am just reiterating the need to run the tool as the tool already creates indexes in all appropriate directories. It is highly recommended that all plug-ins in a product provide prebuilt index. Post merging, index is adjusted to precisely contain these and only these documents that are in the built set of TOCs. Any topis in TOC that did not make it into master index from prebuilt indexes will be parsed and added at runtime as before. Konrad, that's great news! Did you get a chance to time the current implementation: 1) Total ellapsed time without prebuilt indexes 2) Time with prebuilt indexes 3) Time with prebuilt indexes in JARd plug-ins On Thinkpad T40 running from the battery, using 5 eclipse doc plug-ins: 1. 46s (unjarred plug-ins with doc.zip, without index) 2. 1.4s (unjarred plug-ins with doc.zip, with index) 33 times speed-up 3. 2.2s ( jarred plug-ins without doc.zip, with index) 21 times speed-up Do I have a bug, or what? :-) I think of injecting to some Thread.sleep (10000) so users will still notice how cool we are and create some kind of index at runtime, and we can improve to Thread.sleep(5000) in the next release. The search results page in help browser refreshes every 2 seconds if index is updating, so to the user sees the first results in about 46.3s, 2.3s, 4.3s (in these scenarios). I have not measured, but adding fragments with large translation delta, let's say with all documents translatable, will double the time for 2. and 3. What code does in this case is merge English and NL indexes (double the work), and then remove English duplicates (which could mean all English documents). So the speed-up for English only product is 33 times, but with translations in not English language the speed-up will be only about 10-15 times if all or most documents are translated. The following idea allows NL to be speeded up to the same level as English: - the action that creates index in NL directory, can mark it as "complete" if number of translated HTML files in NL directory is 95%-100% number of English files. - help system would merges such index and ignore other indexes, for example at the root of the plug-in; no additional index to merge, no duplicates to worry about. These are all great numbers. I suggest we focus on bringing the function to the polished/GA level while I investigate detecting the 'complete' status for the index. BTW, where would I write it if true? existence of a an empty file "indexed_complete" in the index directory would suffice. The complete index idea adds to the complexity of the build index action, the impact on help system is small (if file exist, do less work). If we implment this, we have two options: a) when most documents are translated, mark it as complete, and remaining documents will be indexed at runtime. The magic number when to mark index as complete would be about 98% for maximized performance. b) when creating nl index, add all NL files from nl directory, but also add English for the missing documents. No documents would be parsed at runtime and the magic number for the tool could be much lower for example 80%. The only cost would be little bit of wasted installation size as two indexes, english end nl would contain some of the same English files. I released final implementation of index merging, good to go for 3.1. Konrad, do you still want me to product 'index_complete' file in the index? Yes. It is not critical, but if we use new performance results as our new base, doing this will give 2 times speedup for NLed plugins. I think it is worth it. I released the code that checks for 'index_complete' file and does not merge more indexes from this plugin. I sugggest option a) from comment 11 when building plug-in index. Compare number of documents in the prebuilt index and in TOCs of given plug-in. Write a file if numbers are close. Option b) is more complicated and is only guaranteed to give consistent results if fragments specifies exact match for plug-in version. Dejan, This bug should be resolved. The feature is in M7. If the detail with 'index_complete' marker file is not added by M7, it can be tracked in a separate report. Fixed. I opened bug 93729 to track the remaining 'index_complete' item. This is very coo - but what would be involved in making this pre-index operation work with doc.zip archives? I want to use this as part of my build process, but I do not ship the raw html source. I actually create a doc.zip from a bunch of smaller plugins, and then I would like to pre-index the whole zipfile at the end - but I can't. It looks like what I will have to do is unzip it again just to pre-index it, then delete the html folder. Since the Help system already knows how to index doc.zip archives, this capability can't be that hard to add - can it? Eclipse SDK also zips the docs but we run the indexer just before the zip operation. It is strange to have the zip created from HTMLs that are not in the same plug-in. I can see how that would be a problem to you. Konrad? Agree with Dejan, doc.zip is equvalent of binary for documentation. Indexing from doc.zip could be implemented, but I don't think it is necessary. It is much easier to unzip, and later delete the folder. With doc.zip, we would need to worry about enumerating files inside the zip, picking between documents that exists both in doc.zip and flat. doc.zip should be built after the index. If it exists already, it might be from the previous run of the build, and we should not use it. No time to look at these before 3.1. We can consider patches post 3.1 |