Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-help-dev] indexing keywords in M5

Search results ranking is done mostly be the Lucene search engine, and they
use a fairly complex formula that takes into account the size of the
document, the length of the query, etc.
The following is taken from their FAQ section on ranking:

                                                                
                                                                
                                                                
                                                                
 31. How does Lucene assigns scores to hits ?                   
                                                                
                                                                
 Here is a quote from Doug himself (posted on July 2001 to the  
 Lucene users mailing list):                                    
 For the record, Lucene's scoring algorithm is, roughly:        
                                                                
   score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t /      
 norm_d_t)                                                      
                                                                
 where:                                                         
   score_d   : score for document d                             
   sum_t     : sum for all terms t                              
   tf_q      : the square root of the frequency of t in the     
 query                                                          
   tf_d      : the square root of the frequency of t in d       
   idf_t     : log(numDocs/docFreq_t+1) + 1.0                   
   numDocs   : number of documents in index                     
   docFreq_t : number of documents containing t                 
   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))                      
   norm_d_t  : square root of number of tokens in d in the same 
 field as t                                                     
                                                                
 (I hope that's right!)                                         
                                                                
 [Doug later added...]                                          
                                                                
 Make that:                                                     
                                                                
   score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t /       
 norm_d_t * boost_t) * coord_q_d                                
                                                                
 where                                                          
                                                                
   boost_t    : the user-specified boost for term t             
   coord_q_d  : number of terms in both query and document /    
 number of terms in query                                       
                                                                
 The coordination factor gives an AND-like boost to documents   
 that contain,                                                  
 e.g., all three terms in a three word query over those that    
 contain just two                                               
 of the words.                                                  
                                                                
                                                                





|---------+----------------------------------->
|         |           Robert Turek/Santa      |
|         |           Teresa/IBM@IBMUS        |
|         |           Sent by:                |
|         |           platform-help-dev-admin@|
|         |           eclipse.org             |
|         |                                   |
|         |                                   |
|         |           02/21/2003 12:01 PM     |
|         |           Please respond to       |
|         |           platform-help-dev       |
|         |                                   |
|---------+----------------------------------->
  >-------------------------------------------------------------------------------------------------------------|
  |                                                                                                             |
  |       To:       platform-help-dev@xxxxxxxxxxx                                                               |
  |       cc:                                                                                                   |
  |       Subject:  Re: [platform-help-dev] indexing keywords in M5                                             |
  |                                                                                                             |
  |                                                                                                             |
  >-------------------------------------------------------------------------------------------------------------|







I worked up a little test plugin with some simple files.  On these,
it seems to rank multiple hits in the text (100%), then hits in the title
or
main heading (63%), then  keyword only (61%), then in the text once or
in a lower-level heading (60%).  Slightly different from what I noticed
on some other files.

____________________________________  The Bobster




                      konradk@xxxxxxxxxx

                      Sent by:                        To:
platform-help-dev@xxxxxxxxxxx
                      platform-help-dev-admin@        cc:

                      eclipse.org                     Subject:  Re:
[platform-help-dev] indexing keywords in M5


                      02/20/2003 03:14 PM

                      Please respond to

                      platform-help-dev







Do you have a specific example that shows higher ranking of keywords or
heading than body?  Because, knowing the code, I do not think any of this
is true, but I must say I am very glad that it feels that way.

Konrad Kolosowski
Eclipse Help System




                      Robert Turek/Santa

                      Teresa/IBM@IBMUS                To:
platform-help-dev@xxxxxxxxxxx

                      Sent by:                        cc:

                      platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] indexing keywords in M5
                      eclipse.org



                      02/20/2003 03:46 PM

                      Please respond to

                      platform-help-dev










I finally got a chance to try this . . . it's a nice feature.

In your note it sounds like the ranking of the keywords
is the same as if the word actually appeared in some text.
It appears that the keyword hits are ranked higher than
hits in the body of a document, but a little lower than hits
in the title or main heading.  I assume that having the
keyword in the meta tag and in the heading or title would
get the highest ranking.  Is this true?

____________________________________  The Bobster
                Notes: Robert Turek/Santa Teresa/IBM@IBMUS
                                                 The net: turekr@xxxxxxxxxx
                                                            Fone:
(408)463-3602




                      konradk@xxxxxxxxxx

                      Sent by:                        To:
platform-help-dev@xxxxxxxxxxx
                      platform-help-dev-admin@        cc:

                      eclipse.org                     Subject:
[platform-help-dev] indexing keywords in M5


                      02/04/2003 09:07 PM

                      Please respond to

                      platform-help-dev







For 2.1 M5, I have added support for indexing Meta Keywords to Search in
Help System.  The corresponding Meta tag that can be placed in the head of
HTML documents looks like:
<meta name="keywords" content="term 1, term 2, ...">
The separator used in the content attribute do not matter, since search
treats comas, semicolons, spaces as word separators and does not index
them.  It is wise to use comma, in case the text analyzers plugged into
search engine become more picky in the future.
The keywords are indexed together with the text extracted from the
document, hence ranking of search hit will not depend on whether searched
word appears in the meta tag or it is actually in the body of the document.

Konrad Kolosowski
Eclipse Help System

_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev


_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev





Back to the top