Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-help-dev] indexing keywords in M5



Thanks . . . that helps explain why sometimes I see changes in the
ranking after adding or removing files that I assumed wouldn't
affect the ranking.    Doesn't seem to explain why one doc
with the word in the title is ranked higher than another with the
word in the text.... must be the result of norm_d_t.

On a philosophical level . . . since some highly-trained, professional
 information specialist as intentionally selected the keywords, should
they be given a higher priority?

____________________________________  The Bobster



                                                                                                                      
                      birsan@xxxxxxxxxx                                                                               
                      Sent by:                        To:       platform-help-dev@xxxxxxxxxxx                         
                      platform-help-dev-admin@        cc:                                                             
                      eclipse.org                     Subject:  Re: [platform-help-dev] indexing keywords in M5       
                                                                                                                      
                                                                                                                      
                      02/21/2003 09:26 AM                                                                             
                      Please respond to                                                                               
                      platform-help-dev                                                                               
                                                                                                                      
                                                                                                                      




Search results ranking is done mostly be the Lucene search engine, and they
use a fairly complex formula that takes into account the size of the
document, the length of the query, etc.
The following is taken from their FAQ section on ranking:





 31. How does Lucene assigns scores to hits ?


 Here is a quote from Doug himself (posted on July 2001 to the
 Lucene users mailing list):
 For the record, Lucene's scoring algorithm is, roughly:

   score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t /
 norm_d_t)

 where:
   score_d   : score for document d
   sum_t     : sum for all terms t
   tf_q      : the square root of the frequency of t in the
 query
   tf_d      : the square root of the frequency of t in d
   idf_t     : log(numDocs/docFreq_t+1) + 1.0
   numDocs   : number of documents in index
   docFreq_t : number of documents containing t
   norm_q    : sqrt(sum_t((tf_q*idf_t)^2))
   norm_d_t  : square root of number of tokens in d in the same
 field as t

 (I hope that's right!)

 [Doug later added...]

 Make that:

   score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t /
 norm_d_t * boost_t) * coord_q_d

 where

   boost_t    : the user-specified boost for term t
   coord_q_d  : number of terms in both query and document /
 number of terms in query

 The coordination factor gives an AND-like boost to documents
 that contain,
 e.g., all three terms in a three word query over those that
 contain just two
 of the words.







|---------+----------------------------------->
|         |           Robert Turek/Santa      |
|         |           Teresa/IBM@IBMUS        |
|         |           Sent by:                |
|         |           platform-help-dev-admin@|
|         |           eclipse.org             |
|         |                                   |
|         |                                   |
|         |           02/21/2003 12:01 PM     |
|         |           Please respond to       |
|         |           platform-help-dev       |
|         |                                   |
|---------+----------------------------------->

>-------------------------------------------------------------------------------------------------------------|

  |
|
  |       To:       platform-help-dev@xxxxxxxxxxx
|
  |       cc:
|
  |       Subject:  Re: [platform-help-dev] indexing keywords in M5
|
  |
|
  |
|

>-------------------------------------------------------------------------------------------------------------|








I worked up a little test plugin with some simple files.  On these,
it seems to rank multiple hits in the text (100%), then hits in the title
or
main heading (63%), then  keyword only (61%), then in the text once or
in a lower-level heading (60%).  Slightly different from what I noticed
on some other files.

____________________________________  The Bobster




                      konradk@xxxxxxxxxx

                      Sent by:                        To:
platform-help-dev@xxxxxxxxxxx
                      platform-help-dev-admin@        cc:

                      eclipse.org                     Subject:  Re:
[platform-help-dev] indexing keywords in M5


                      02/20/2003 03:14 PM

                      Please respond to

                      platform-help-dev







Do you have a specific example that shows higher ranking of keywords or
heading than body?  Because, knowing the code, I do not think any of this
is true, but I must say I am very glad that it feels that way.

Konrad Kolosowski
Eclipse Help System




                      Robert Turek/Santa

                      Teresa/IBM@IBMUS                To:
platform-help-dev@xxxxxxxxxxx

                      Sent by:                        cc:

                      platform-help-dev-admin@        Subject:  Re:
[platform-help-dev] indexing keywords in M5
                      eclipse.org



                      02/20/2003 03:46 PM

                      Please respond to

                      platform-help-dev










I finally got a chance to try this . . . it's a nice feature.

In your note it sounds like the ranking of the keywords
is the same as if the word actually appeared in some text.
It appears that the keyword hits are ranked higher than
hits in the body of a document, but a little lower than hits
in the title or main heading.  I assume that having the
keyword in the meta tag and in the heading or title would
get the highest ranking.  Is this true?

____________________________________  The Bobster
                Notes: Robert Turek/Santa Teresa/IBM@IBMUS
                                                 The net: turekr@xxxxxxxxxx
                                                            Fone:
(408)463-3602




                      konradk@xxxxxxxxxx

                      Sent by:                        To:
platform-help-dev@xxxxxxxxxxx
                      platform-help-dev-admin@        cc:

                      eclipse.org                     Subject:
[platform-help-dev] indexing keywords in M5


                      02/04/2003 09:07 PM

                      Please respond to

                      platform-help-dev







For 2.1 M5, I have added support for indexing Meta Keywords to Search in
Help System.  The corresponding Meta tag that can be placed in the head of
HTML documents looks like:
<meta name="keywords" content="term 1, term 2, ...">
The separator used in the content attribute do not matter, since search
treats comas, semicolons, spaces as word separators and does not index
them.  It is wise to use comma, in case the text analyzers plugged into
search engine become more picky in the future.
The keywords are indexed together with the text extracted from the
document, hence ranking of search hit will not depend on whether searched
word appears in the meta tag or it is actually in the body of the document.

Konrad Kolosowski
Eclipse Help System

_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev


_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev



_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev





Back to the top