[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [platform-help-dev] indexing keywords in M5
|
Thanks . . . that helps explain why sometimes I see changes in the
ranking after adding or removing files that I assumed wouldn't
affect the ranking. Doesn't seem to explain why one doc
with the word in the title is ranked higher than another with the
word in the text.... must be the result of norm_d_t.
On a philosophical level . . . since some highly-trained, professional
information specialist as intentionally selected the keywords, should
they be given a higher priority?
____________________________________ The Bobster
birsan@xxxxxxxxxx
Sent by: To: platform-help-dev@xxxxxxxxxxx
platform-help-dev-admin@ cc:
eclipse.org Subject: Re: [platform-help-dev] indexing keywords in M5
02/21/2003 09:26 AM
Please respond to
platform-help-dev
Search results ranking is done mostly be the Lucene search engine, and they
use a fairly complex formula that takes into account the size of the
document, the length of the query, etc.
The following is taken from their FAQ section on ranking:
31. How does Lucene assigns scores to hits ?
Here is a quote from Doug himself (posted on July 2001 to the
Lucene users mailing list):
For the record, Lucene's scoring algorithm is, roughly:
score_d = sum_t( tf_q * idf_t / norm_q * tf_d * idf_t /
norm_d_t)
where:
score_d : score for document d
sum_t : sum for all terms t
tf_q : the square root of the frequency of t in the
query
tf_d : the square root of the frequency of t in d
idf_t : log(numDocs/docFreq_t+1) + 1.0
numDocs : number of documents in index
docFreq_t : number of documents containing t
norm_q : sqrt(sum_t((tf_q*idf_t)^2))
norm_d_t : square root of number of tokens in d in the same
field as t
(I hope that's right!)
[Doug later added...]
Make that:
score_d = sum_t(tf_q * idf_t / norm_q * tf_d * idf_t /
norm_d_t * boost_t) * coord_q_d
where
boost_t : the user-specified boost for term t
coord_q_d : number of terms in both query and document /
number of terms in query
The coordination factor gives an AND-like boost to documents
that contain,
e.g., all three terms in a three word query over those that
contain just two
of the words.
|---------+----------------------------------->
| | Robert Turek/Santa |
| | Teresa/IBM@IBMUS |
| | Sent by: |
| | platform-help-dev-admin@|
| | eclipse.org |
| | |
| | |
| | 02/21/2003 12:01 PM |
| | Please respond to |
| | platform-help-dev |
| | |
|---------+----------------------------------->
>-------------------------------------------------------------------------------------------------------------|
|
|
| To: platform-help-dev@xxxxxxxxxxx
|
| cc:
|
| Subject: Re: [platform-help-dev] indexing keywords in M5
|
|
|
|
|
>-------------------------------------------------------------------------------------------------------------|
I worked up a little test plugin with some simple files. On these,
it seems to rank multiple hits in the text (100%), then hits in the title
or
main heading (63%), then keyword only (61%), then in the text once or
in a lower-level heading (60%). Slightly different from what I noticed
on some other files.
____________________________________ The Bobster
konradk@xxxxxxxxxx
Sent by: To:
platform-help-dev@xxxxxxxxxxx
platform-help-dev-admin@ cc:
eclipse.org Subject: Re:
[platform-help-dev] indexing keywords in M5
02/20/2003 03:14 PM
Please respond to
platform-help-dev
Do you have a specific example that shows higher ranking of keywords or
heading than body? Because, knowing the code, I do not think any of this
is true, but I must say I am very glad that it feels that way.
Konrad Kolosowski
Eclipse Help System
Robert Turek/Santa
Teresa/IBM@IBMUS To:
platform-help-dev@xxxxxxxxxxx
Sent by: cc:
platform-help-dev-admin@ Subject: Re:
[platform-help-dev] indexing keywords in M5
eclipse.org
02/20/2003 03:46 PM
Please respond to
platform-help-dev
I finally got a chance to try this . . . it's a nice feature.
In your note it sounds like the ranking of the keywords
is the same as if the word actually appeared in some text.
It appears that the keyword hits are ranked higher than
hits in the body of a document, but a little lower than hits
in the title or main heading. I assume that having the
keyword in the meta tag and in the heading or title would
get the highest ranking. Is this true?
____________________________________ The Bobster
Notes: Robert Turek/Santa Teresa/IBM@IBMUS
The net: turekr@xxxxxxxxxx
Fone:
(408)463-3602
konradk@xxxxxxxxxx
Sent by: To:
platform-help-dev@xxxxxxxxxxx
platform-help-dev-admin@ cc:
eclipse.org Subject:
[platform-help-dev] indexing keywords in M5
02/04/2003 09:07 PM
Please respond to
platform-help-dev
For 2.1 M5, I have added support for indexing Meta Keywords to Search in
Help System. The corresponding Meta tag that can be placed in the head of
HTML documents looks like:
<meta name="keywords" content="term 1, term 2, ...">
The separator used in the content attribute do not matter, since search
treats comas, semicolons, spaces as word separators and does not index
them. It is wise to use comma, in case the text analyzers plugged into
search engine become more picky in the future.
The keywords are indexed together with the text extracted from the
document, hence ranking of search hit will not depend on whether searched
word appears in the meta tag or it is actually in the body of the document.
Konrad Kolosowski
Eclipse Help System
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev
_______________________________________________
platform-help-dev mailing list
platform-help-dev@xxxxxxxxxxx
http://dev.eclipse.org/mailman/listinfo/platform-help-dev