[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [platform-ua-dev] searching dynamic content: two approaches
|
It all depends on the performance hit that the user will incur. This is
what we do:
1) We pre-index documents without applying any filtering. For documents
with filtering expressions, we store the result of the expression at the
time of indexing in the index itself.
2) We search as usual. For each document in the search results that
contains filtered content:
2a) Compare the filtering expression value with the store value. If
they match, the result is valid.
2b) If there is no match, add the document to the temporary index,
this time with the filtering applied.
3) We search using the original expression but using the temporary index
instead of the main index.
Performance factors:
Let the initial search result list contain N hits.
In it there will be M documents that use filtering where 0 <= M <= N.
These M documents must be reindexed using an on the fly index with the
filtering applied.
The result of the search will contain K documents, where 0 <= K <= M. We
will only accept results from the K set and discard the rest of the M
set.
In the worst case scenario, M==N. This means that we will have to
reindex all N documents (say, 300) in order to weed out false positives.
The more filtering is used in the docs, the slower the weeding out will
be.
We will keep this temporary index around so that we can reuse it for
subsequent searches when filtering expressions don't change.
Regards,
Dejan Glozic, Ph.D.
Manager, Eclipse Development 1A
D1/R0Q/8200/MKM
IBM Canada Ltd.
Tel. 905 413-2745 T/L 969-2745
Fax. 905 413-4850
Joseph F Pesot
<pesot@xxxxxxxxxx
> To
Sent by: "Eclipse Platform User Assistance
platform-ua-dev-b component developers list."
ounces@xxxxxxxxxx <platform-ua-dev@xxxxxxxxxxx>
g cc
Subject
03/03/2006 10:00 Re: [platform-ua-dev] searching
AM dynamic content: two approaches
Please respond to
"Eclipse Platform
User Assistance
component
developers list."
Hey Curtis,
I suspect that in this release we could just pick one, and then learn from
it ... and so in this case, it might make sense to pick the "easy one" and
run with that this time out ... part of my thinking here is that I agree
with Dorian ... it depends.
Generally speaking, I think #1, accurate results is what you want ... the
reason I say this is that if we return hits that are driven by content that
has been filtered out ... that content won't appear when a user looks at it
... and so the content will appear to be "irrelevant" from a user
perspective. Having said this, I'm not sure how frequently that filtered
content will be the driving factor in "pushing" a topic up the search stack
... in other words, topics with filtered content might make the results
list, but I'm not so sure that they will frequently "jump" to the top.
So, while I think #1 is probably what we really want, I think #2 is
probably ok initially... AND if we ever get around to "boosting" the search
hit value of things like Titles, and Keywords, this probably becomes even
less of an issue.
Joe
_______________________________________________
Joseph Pesot
Rational User Technologies
IBM Software Group, RTP, NC
Phone: (919) 254-7431 (T/L 444)
email: pesot@xxxxxxxxxx
Dorian Birsan
<birsan@xxxxxxxxx
m> To
Sent by: "Eclipse Platform User Assistance
platform-ua-dev-b component developers list."
ounces@xxxxxxxxxx <platform-ua-dev@xxxxxxxxxxx>
g cc
Subject
03/02/2006 08:15 Re: [platform-ua-dev] searching
PM dynamic content: two approaches
Please respond to
"Eclipse Platform
User Assistance
component
developers list."
platform-ua-dev-bounces@xxxxxxxxxxx wrote on 03/02/2006 06:44:57 PM:
>
> I'd like to get some opinions from the community about how to proceed on
the
> following issue:
>
> Now that we've added dynamic content capabilities to user assistance, we
are
> faced with having to support searching this content. There are two basic
> approaches we can take, with advantages and disadvantages to each. I
would
> like to know which one will be most acceptable to users.
>
> 1. Show the results accurately, but at a potentially significant
performance
> hit (depending on how many documents you have with dynamic content).
Any numbers here ? Can you quantify the performance loss?
> 2. Show all potential hits. This means it will find things in sections
that
> would be filtered if you were to open the document. There is no
performance loss here.
>
> It would be possible to do both approaches, and allow the user to switch,
in
> which case we would pick a default.
>
> So the basic question is: Which one is most important for searching,
accuracy or speed?
Both :-)
I think your question is which of the two alternatives should be the
default behavior. In that case, I would go with #2. Even if sections of
documents are hidden,if they contain relevant information to what one
searches for, they may be good candidates.
Can you briefly describe the algorithms in both scenarios?
>
> Thanks,
> Curtis d'Entremont
> Eclipse User Assistance
> IBM Toronto Lab
>
> Phone: (905) 413-5754
> E-Mail: curtispd@xxxxxxxxxx
> _______________________________________________
> platform-ua-dev mailing list
> platform-ua-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/platform-ua-dev
-Dorian_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev
_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev