Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[platform-help-dev] Lucene & Search


Problem
==========
        Currently the eclipse platform does not provide a built-in mechanism for searching the online documentation.

Proposal
==========

The Eclipse PMC and the help component lead have recommended pursuing the Lucene open source search framework as our help search for the V2 release.  For more information on lucene visit http://jakarta.apache.org/lucene/docs/index.html


Prototype
===========

We have conducted an initial look at Lucene and done a quick proof of concept integration to prove the feasibility/suitability of the Lucene option. We are also aware of other places within eclipse to use the search framework however for the moment we will limit ourselves to the help search issue. If that proceeds well additional usages may be suggested.

Brief observations on Lucene:
=====================================
       

*        Lucene is 100% Java and will work on all platforms supported by        
        eclipse (GTK, SUSE, photon, hp, solaris, aix, windows etc.).

*        Small code base & well documented

*        Index/search speeds are reasonable

*        Index files can be persisted/pre-built

*        Incremental indexing as new files become available from plug-ins

*        Supports content tagging (e.g. author)

*        Supports heading filtering

*        Good range of query facilities (an, or, not etc.)

*        Extensible to allow support for other file formats.
        -        Easily made to work over zip files - our prototype  did this

*        Ranks results

*        Designed as a toolkit - intended to be grown & added to

        Lucene is a search framework.
        This means it does not include explicitly knowledge about searching specific
        domains (e.g. html) nor specific languages support (e.g. french), However:

        -        Language oriented searching can be added as extensions. The lucene build
                includes a german extension - unclear how good it is.
       
        -        HTML domain searching can be added. The demo accompanying lucene provides
                a sample html search engine which we can make use of and extend as needed
                however even the basic demo one is useful.

        *        This framework/extensibility approach means that various locale groups
                or search/domain experts can contribute their skills to lucene open source
                effort and also have it benefit eclipse.

        *         open question: DBCS and BiDi language support ... the framework clearly supports
                Latin-1 type languages in terms of the ability to add analyser modules.
                 but since we are not locale language experts it is hard to comment
                on issues - however this again, is an opportunity for contribution from others

*        Lucene is not externalized. That is its strings have not been taken out.
        However most of the strings are not likely be exposed to the user - they are primarily
        programming error cases, or very unlikely cases.


*        Our intent is to not create a modified version of lucene. If we see possibilities for
        improving/enhancing lucene we will work with their open source community.


Your Turn
=============

*        If you have comments on this proposal please let us know via the mailing list.

*        If you have indepth technical knowledge of Lucene we would be interested in
        additional pros/cons/risks/limitations that you are aware of.  
        In addition let us know if you'd be willing to help if needed. We don't need help
        at the moment, but its useful to know who is available.



Back to the top