Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-ua-dev] searching dynamic content: two approaches

It all depends on the performance hit that the user will incur. This is
what we do:

1) We pre-index documents without applying any filtering. For documents
with filtering expressions, we store the result of the expression at the
time of indexing in the index itself.
2) We search as usual. For each document in the search results that
contains filtered content:
      2a) Compare the filtering expression value with the store value. If
they match, the result is valid.
      2b) If there is no match, add the document to the temporary index,
this time with the filtering applied.
3) We search using the original expression but using the temporary index
instead of the main index.

Performance factors:

   Let the initial search result list contain N hits.
   In it there will be M documents that use filtering where 0 <= M <= N.
   These M documents must be reindexed using an on the fly index with the
   filtering applied.
   The result of the search will contain K documents, where 0 <= K <= M. We
   will only accept results from the K set and discard the rest of the M
   set.
   In the worst case scenario, M==N. This means that we will have to
   reindex all N documents (say, 300) in order to weed out false positives.
   The more filtering is used in the docs, the slower the weeding out will
   be.
   We will keep this temporary index around so that we can reuse it for
   subsequent searches when filtering expressions don't change.

Regards,

Dejan Glozic, Ph.D.
Manager, Eclipse Development 1A
D1/R0Q/8200/MKM
IBM Canada Ltd.
Tel. 905 413-2745  T/L 969-2745
Fax. 905 413-4850



                                                                           
             Joseph F Pesot                                                
             <pesot@xxxxxxxxxx                                             
             >                                                          To 
             Sent by:                  "Eclipse Platform User Assistance   
             platform-ua-dev-b         component developers list."         
             ounces@xxxxxxxxxx         <platform-ua-dev@xxxxxxxxxxx>       
             g                                                          cc 
                                                                           
                                                                   Subject 
             03/03/2006 10:00          Re: [platform-ua-dev] searching     
             AM                        dynamic content: two approaches     
                                                                           
                                                                           
             Please respond to                                             
             "Eclipse Platform                                             
              User Assistance                                              
                 component                                                 
             developers list."                                             
                                                                           
                                                                           




Hey Curtis,
I suspect that in this release we could just pick one, and then learn from
it ... and so in this case, it might make sense to pick the "easy one" and
run with that this time out ... part of my thinking here is that I agree
with Dorian ... it depends.

Generally speaking, I think #1, accurate results is what you want ... the
reason I say this is that if we return hits that are driven by content that
has been filtered out ... that content won't appear when a user looks at it
... and so the content will appear to be "irrelevant" from a user
perspective.  Having said this, I'm not sure how frequently that filtered
content will be the driving factor in "pushing" a topic up the search stack
... in other words, topics with filtered content might make the results
list, but I'm not so sure that they will frequently "jump" to the top.
So, while I think #1 is probably what we really want, I think #2 is
probably ok initially... AND if we ever get around to "boosting" the search
hit value of things like Titles, and Keywords, this probably becomes even
less of an issue.

Joe
_______________________________________________

Joseph Pesot
Rational User Technologies
IBM Software Group, RTP, NC
Phone: (919) 254-7431 (T/L 444)
email:   pesot@xxxxxxxxxx




             Dorian Birsan
             <birsan@xxxxxxxxx
             m>                                                         To
             Sent by:                  "Eclipse Platform User Assistance
             platform-ua-dev-b         component developers list."
             ounces@xxxxxxxxxx         <platform-ua-dev@xxxxxxxxxxx>
             g                                                          cc

                                                                   Subject
             03/02/2006 08:15          Re: [platform-ua-dev] searching
             PM                        dynamic content: two approaches


             Please respond to
             "Eclipse Platform
              User Assistance
                 component
             developers list."







platform-ua-dev-bounces@xxxxxxxxxxx wrote on 03/02/2006 06:44:57 PM:

>
> I'd like to get some opinions from the community about how to proceed on
the
> following issue:
>
> Now that we've added dynamic content capabilities to user assistance, we
are
> faced with having to support searching this content. There are two basic
> approaches we can take, with advantages and disadvantages to each. I
would
> like to know which one will be most acceptable to users.
>
> 1. Show the results accurately, but at a potentially significant
performance
> hit (depending on how many documents you have with dynamic content).

Any numbers here ? Can you quantify the performance loss?

> 2. Show all potential hits. This means it will find things in sections
that
> would be filtered if you were to open the document. There is no
performance loss here.
>
> It would be possible to do both approaches, and allow the user to switch,
in
> which case we would pick a default.
>
> So the basic question is: Which one is most important for searching,
accuracy or speed?

Both :-)

I think your question is which of the two alternatives should be the
default behavior. In that case, I would go with #2. Even if sections of
documents are hidden,if they contain relevant information to what one
searches for, they may be good candidates.

Can you briefly describe the algorithms in both scenarios?

>
> Thanks,
> Curtis d'Entremont
> Eclipse User Assistance
> IBM Toronto Lab
>
> Phone: (905) 413-5754
> E-Mail: curtispd@xxxxxxxxxx
> _______________________________________________
> platform-ua-dev mailing list
> platform-ua-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/platform-ua-dev

-Dorian_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev


_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev




Back to the top