Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [platform-ua-dev] searching dynamic content: two approaches


That sounds like a good, straightforward algorithm.
Looking at the filtering proposal (I assume this is what is implemented) at http://www.eclipse.org/eclipse/platform-ua/proposals/xhtml/HelpDynamicContent.html,  I think there is room for optimization, to allow for both quick and (almost) accurate results.

For example, filter criteria seem to fall into 3 categories:
 1)  stable, unlikely to change (os, ws)
 2)  relatively stable,  rare changes, primarily during eclipse restarts and change of install cofiguration: product, plugin existence, use defined properties
 3)  subject to change during an eclipse session: activity filters

In theory, you probably don't need to worry about having docs with filters from #1 in the temp index (unless multiple filters are used).
Docs with filters from #2 may need to be re-indexed when the change occurs (this would be ok, you probably installed new plugins,etc.).
The temp index, will be used primarily for docs falling into category #3.

I think most of the docs will fall into category #1, so no temp indexing is needed for them (this will address Mazen's concerns).

Perhaps you can keep an persistent list of what kind of filters apply to what documents (you may already have this or derive it from the main index), and do the right processing when needed.

BTW, one also needs to balance the workspace search support vs. infocenter search support, so both are accurate and fast.


Anyway, just thoughts. It's been a while since I touched help code so not sure if I make much sense :-)

-Dorian



Dejan Glozic/Toronto/IBM@IBMCA
Sent by: platform-ua-dev-bounces@xxxxxxxxxxx

03/03/2006 10:34 AM

Please respond to
"Eclipse Platform User Assistance component developers list."

To
"Eclipse Platform User Assistance component developers list." <platform-ua-dev@xxxxxxxxxxx>
cc
"Eclipse Platform User Assistance component developers list." <platform-ua-dev@xxxxxxxxxxx>, platform-ua-dev-bounces@xxxxxxxxxxx
Subject
Re: [platform-ua-dev] searching dynamic content: two approaches





It all depends on the performance hit that the user will incur. This is
what we do:

1) We pre-index documents without applying any filtering. For documents
with filtering expressions, we store the result of the _expression_ at the
time of indexing in the index itself.
2) We search as usual. For each document in the search results that
contains filtered content:
     2a) Compare the filtering _expression_ value with the store value. If
they match, the result is valid.
     2b) If there is no match, add the document to the temporary index,
this time with the filtering applied.
3) We search using the original _expression_ but using the temporary index
instead of the main index.

Performance factors:

  Let the initial search result list contain N hits.
  In it there will be M documents that use filtering where 0 <= M <= N.
  These M documents must be reindexed using an on the fly index with the
  filtering applied.
  The result of the search will contain K documents, where 0 <= K <= M. We
  will only accept results from the K set and discard the rest of the M
  set.
  In the worst case scenario, M==N. This means that we will have to
  reindex all N documents (say, 300) in order to weed out false positives.
  The more filtering is used in the docs, the slower the weeding out will
  be.
  We will keep this temporary index around so that we can reuse it for
  subsequent searches when filtering expressions don't change.

Regards,

Dejan Glozic, Ph.D.
Manager, Eclipse Development 1A
D1/R0Q/8200/MKM
IBM Canada Ltd.
Tel. 905 413-2745  T/L 969-2745
Fax. 905 413-4850



                                                                         
            Joseph F Pesot                                                
            <pesot@xxxxxxxxxx                                            
            >                                                          To
            Sent by:                  "Eclipse Platform User Assistance  
            platform-ua-dev-b         component developers list."        
            ounces@xxxxxxxxxx         <platform-ua-dev@xxxxxxxxxxx>      
            g                                                          cc
                                                                         
                                                                  Subject
            03/03/2006 10:00          Re: [platform-ua-dev] searching    
            AM                        dynamic content: two approaches    
                                                                         
                                                                         
            Please respond to                                            
            "Eclipse Platform                                            
             User Assistance                                              
                component                                                
            developers list."                                            
                                                                         
                                                                         




Hey Curtis,
I suspect that in this release we could just pick one, and then learn from
it ... and so in this case, it might make sense to pick the "easy one" and
run with that this time out ... part of my thinking here is that I agree
with Dorian ... it depends.

Generally speaking, I think #1, accurate results is what you want ... the
reason I say this is that if we return hits that are driven by content that
has been filtered out ... that content won't appear when a user looks at it
... and so the content will appear to be "irrelevant" from a user
perspective.  Having said this, I'm not sure how frequently that filtered
content will be the driving factor in "pushing" a topic up the search stack
... in other words, topics with filtered content might make the results
list, but I'm not so sure that they will frequently "jump" to the top.
So, while I think #1 is probably what we really want, I think #2 is
probably ok initially... AND if we ever get around to "boosting" the search
hit value of things like Titles, and Keywords, this probably becomes even
less of an issue.

Joe
_______________________________________________

Joseph Pesot
Rational User Technologies
IBM Software Group, RTP, NC
Phone: (919) 254-7431 (T/L 444)
email:   pesot@xxxxxxxxxx




            Dorian Birsan
            <birsan@xxxxxxxxx
            m>                                                         To
            Sent by:                  "Eclipse Platform User Assistance
            platform-ua-dev-b         component developers list."
            ounces@xxxxxxxxxx         <platform-ua-dev@xxxxxxxxxxx>
            g                                                          cc

                                                                  Subject
            03/02/2006 08:15          Re: [platform-ua-dev] searching
            PM                        dynamic content: two approaches


            Please respond to
            "Eclipse Platform
             User Assistance
                component
            developers list."







platform-ua-dev-bounces@xxxxxxxxxxx wrote on 03/02/2006 06:44:57 PM:

>
> I'd like to get some opinions from the community about how to proceed on
the
> following issue:
>
> Now that we've added dynamic content capabilities to user assistance, we
are
> faced with having to support searching this content. There are two basic
> approaches we can take, with advantages and disadvantages to each. I
would
> like to know which one will be most acceptable to users.
>
> 1. Show the results accurately, but at a potentially significant
performance
> hit (depending on how many documents you have with dynamic content).

Any numbers here ? Can you quantify the performance loss?

> 2. Show all potential hits. This means it will find things in sections
that
> would be filtered if you were to open the document. There is no
performance loss here.
>
> It would be possible to do both approaches, and allow the user to switch,
in
> which case we would pick a default.
>
> So the basic question is: Which one is most important for searching,
accuracy or speed?

Both :-)

I think your question is which of the two alternatives should be the
default behavior. In that case, I would go with #2. Even if sections of
documents are hidden,if they contain relevant information to what one
searches for, they may be good candidates.

Can you briefly describe the algorithms in both scenarios?

>
> Thanks,
> Curtis d'Entremont
> Eclipse User Assistance
> IBM Toronto Lab
>
> Phone: (905) 413-5754
> E-Mail: curtispd@xxxxxxxxxx
> _______________________________________________
> platform-ua-dev mailing list
> platform-ua-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/platform-ua-dev

-Dorian_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev


_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev


_______________________________________________
platform-ua-dev mailing list
platform-ua-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/platform-ua-dev


Back to the top