Bug 191522 - [api] provide full text search functionality over task comments
Summary: [api] provide full text search functionality over task comments
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P1 enhancement (vote)
Target Milestone: 3.7   Edit
Assignee: David Green CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed, noteworthy, plan
: 270901 (view as bug list)
Depends on: 359518 365224 369414 371017 371029 372725 372838 373178 373279 373557
Blocks: 205627
  Show dependency tree
 
Reported: 2007-06-07 13:11 EDT by Andreas Goetz CLA
Modified: 2012-03-21 10:54 EDT (History)
8 users (show)

See Also:


Attachments
mylyn/context/zip (39.81 KB, application/octet-stream)
2011-10-25 20:36 EDT, David Green CLA
no flags Details
mylyn/context/zip (147.91 KB, application/octet-stream)
2011-11-09 19:05 EST, David Green CLA
no flags Details
mylyn/context/zip (68.08 KB, application/octet-stream)
2012-02-22 16:40 EST, David Green CLA
no flags Details
screenshot (13.41 KB, image/png)
2012-03-03 07:26 EST, Steffen Pingel CLA
no flags Details
screenshot (16.90 KB, image/png)
2012-03-03 07:29 EST, Steffen Pingel CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Goetz CLA 2007-06-07 13:11:15 EDT
I'd like to see full text search functionality inside task comments across all tasks/ all tasks of a repository. The standard "Find" box seems to filter based on task title only.
Use case: I remember some resource mentioned in one of my many tasks/bugs, but can't remember which one.
Comment 1 Eugene Kuleshov CLA 2007-06-07 13:36:32 EDT
We've been waiting for such request! 
The current workaround is to use Search (Ctrl-H) to search in the repository. Though it will send search request to server.
Comment 2 Joe Thomas-Kerr CLA 2008-09-17 20:29:11 EDT
I'd also really appreciate such a feature.

The workaround suggested doesn't work for local tasks.

Unfortunately given the age of this bug it doesn't look like a priority.

Joe.
Comment 3 Robert Elves CLA 2008-09-24 01:24:51 EDT
Search within a single editor will likely be our first step in this direction. We'll then look at repository/local task wide search for 3.2.
Comment 4 Steffen Pingel CLA 2009-04-02 06:10:58 EDT
*** Bug 270901 has been marked as a duplicate of this bug. ***
Comment 5 David Green CLA 2011-10-17 12:06:03 EDT
I've pushed some new o.e.m.tasks.index.* plug-ins to the incubator on a topic branch "task-359518_task_list_index_search".  These implement search over comments and other attributes within the task list (for both local and remote comments).  See bug 359518 for details.
Comment 6 David Green CLA 2011-10-17 12:07:41 EDT
(In reply to bug 359518 comment #13)
> Shouldn't we include all attributes that are defined in the common task schema,
> i.e. priority, severity etc.?

That makes sense to me.  Lucene provides some useful functionality related to date attributes.  I'll take a look at this.
Comment 7 David Green CLA 2011-10-18 02:50:06 EDT
I've removed topic branch "task-359518_task_list_index_search" and created a new topic branch named "task-191522" (named after this task).

I've also pushed a couple more unit tests that verify proper index behaviour.
Comment 8 Steffen Pingel CLA 2011-10-18 03:21:13 EDT
Excellent, I'll try to get the tasks.ui changes in soon so we can move this into master. I strongly recommend more descriptive names for branches to ease identification, e.g. task#191522-full-text-search
Comment 9 David Green CLA 2011-10-18 13:31:05 EDT
(In reply to comment #8)
> Excellent, I'll try to get the tasks.ui changes in soon so we can move this into
> master.

great, thanks

> I strongly recommend more descriptive names for branches to ease
> identification, e.g. task#191522-full-text-search

done: I've moved the changeset to topic branch task-191522-full-text-search and deleted the old topic branch
Comment 10 Steffen Pingel CLA 2011-10-20 09:23:44 EDT
The build.properties and project settings of the new bundles are not correct. When creating new projects please follow the guideline described in http://wiki.eclipse.org/Mylyn/Contributor_Reference#Creating_new_plug-ins.

I'm getting these NPEs after activating the bundles:

!ENTRY org.eclipse.core.jobs 4 2 2011-10-20 15:03:18.182
!MESSAGE An internal error occurred during: "Task List Indexer".
!STACK 0
java.lang.NullPointerException
	at org.eclipse.mylyn.internal.tasks.index.core.TaskListIndex.addIndexedAttributes(TaskListIndex.java:493)
	at org.eclipse.mylyn.internal.tasks.index.core.TaskListIndex.access$8(TaskListIndex.java:477)
	at org.eclipse.mylyn.internal.tasks.index.core.TaskListIndex$MaintainIndexJob.add(TaskListIndex.java:693)
	at org.eclipse.mylyn.internal.tasks.index.core.TaskListIndex$MaintainIndexJob.run(TaskListIndex.java:593)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:54)
	
All runtime in the job should be caught, aggregated and logged to avoid disruption.
Comment 11 David Green CLA 2011-10-20 13:54:14 EDT
(In reply to comment #10)

Thanks for your feedback Steffen, I'll look into these issues.

> All runtime in the job should be caught, aggregated and logged to avoid
> disruption.

Do you mean catching runtime exceptions in the outer try/catch block, or is there more to it?  What do you mean by aggregated?
Comment 12 Steffen Pingel CLA 2011-10-20 18:19:41 EDT
(In reply to comment #11)
> > All runtime in the job should be caught, aggregated and logged to avoid
> > disruption.
> 
> Do you mean catching runtime exceptions in the outer try/catch block, or is
> there more to it?  

To avoid popup dialogs MaintainIndexJob should catch all exceptions and always return OK_STATUS or CANCEL_STATUS regardless of the result.

> What do you mean by aggregated?

Maybe error aggregation is a non issue. I was thinking that if multiple task data objects are processed at most one error log entry should be created using a MultiStatus.
Comment 13 David Green CLA 2011-10-20 18:38:27 EDT
(In reply to comment #12)
> To avoid popup dialogs MaintainIndexJob should catch all exceptions and always
> return OK_STATUS or CANCEL_STATUS regardless of the result.

That seems wrong to me - shouldn't the job return the actual result?  I realize that there's a usability issue here - would setting the job as a system job make a difference?
Comment 14 Steffen Pingel CLA 2011-10-21 05:01:48 EDT
You may be able to avoid the popup dialog by registering a custom status handler. I have not investigated if that works though. Returning a different status is the pattern that other components in tasks follow. I would suggest that we keep this consistent. Since the job is not even API I don't see how this the status is relevant.
Comment 15 David Green CLA 2011-10-22 01:10:50 EDT
I've added most of the common task attributes now, including date fields.  These can be searched on with ranges, for example:

modification_date:[20111015 TO 20111022]

I've also addressed the exception handling problem in the update job, and the NPE.

As for project settings, I've updated the .project files with the API analysis nature and builder.  I originally copied the .settings folders from tasks.core - so those should be right.  Let me know if I've missed anything.
Comment 16 David Green CLA 2011-10-25 20:30:05 EDT
I've now merged the topic branch into master.  The core, ui and tests plug-ins are building nicely, but I'm having troubles with the feature plug-in.  I'm not sure why, but Tycho isn't resolving dependencies as I would expect.  Here's the error message I'm getting:

pre. 
[ERROR] Failed to execute goal on project org.eclipse.mylyn.tasks.index: Could not resolve dependencies for project org.eclipse.mylyn.incubator:org.eclipse.mylyn.tasks.index:eclipse-feature:3.7.0-SNAPSHOT: The following artifacts could notbe resolved: org.eclipse.mylyn.tasks:org.eclipse.mylyn.tasks.index.core:jar:3.7.0-SNAPSHOT, org.eclipse.mylyn.tasks:org.eclipse.mylyn.tasks.index.ui:jar:3.7.0-SNAPSHOT: The repository system is offline but the artifact org.eclipse.mylyn.tasks:org.eclipse.mylyn.tasks.index.core:jar:3.7.0-SNAPSHOT is not available in the local repository.

Any ideas Steffen?  I'd like to add this feature so that users can try it out.

For now I've commented out the tasks.index-feature module in the incubator's root pom.xml
Comment 17 David Green CLA 2011-10-25 20:36:03 EDT
Created attachment 205959 [details]
mylyn/context/zip
Comment 18 Steffen Pingel CLA 2011-10-26 10:16:38 EDT
I'm still getting the following error for org.eclipse.mylyn.tasks.index.core: Unbound classpath container: 'JRE System Library [JavaSE-1.7]' in project 'org.eclipse.mylyn.tasks.index.core'. Please resolve that. Also, the build.properties still do not specify the required includes for license files. Please resolve that before publishing the feature. 

To debug the build problem please push a review to stage a build with your proposed changes.
Comment 19 David Green CLA 2011-10-26 15:33:43 EDT
(In reply to comment #18)
> I'm still getting the following error for org.eclipse.mylyn.tasks.index.core:
> Unbound classpath container: 'JRE System Library [JavaSE-1.7]' in project
> 'org.eclipse.mylyn.tasks.index.core'. Please resolve that. Also, the
> build.properties still do not specify the required includes for license files.
> Please resolve that before publishing the feature.

done

> To debug the build problem please push a review to stage a build with your
> proposed changes.

I've created the following review: http://review.mylyn.org/#change,93
Comment 20 Steffen Pingel CLA 2011-10-26 15:41:10 EDT
Thanks. The build seems to work now. Feel free to push to master and I'll check the build results in case hudson.eclipse.org comes back with a build failure.
Comment 21 David Green CLA 2011-10-26 15:57:49 EDT
Done, pushed to master.

What's left to do here before we can look at a move review?
* UI review
* do we need to file a CQ for use of Lucene?
* ...anything else?
Comment 22 Steffen Pingel CLA 2011-10-26 16:52:40 EDT
We'll need to do some testing and a code review before moving the code. We can file the CQ for Lucene once the code is moved and before it is released.
Comment 23 David Green CLA 2011-11-02 17:46:44 EDT
Steffen do you foresee any problems if we set @org.eclipse.mylyn.tasks.index.core@ bundle dependency versions to Mylyn 3.6?  We're not consuming any 3.7 APIs, and we'd like to consume the bundle with Mylyn 3.6.  Note that the @org.eclipse.mylyn.tasks.index.ui@ bundle would have to continue depending on Mylyn 3.7.
Comment 24 Steffen Pingel CLA 2011-11-02 19:02:09 EDT
I don't see any problem with that. We rarely specify version constraints on a bundle level.
Comment 25 David Green CLA 2011-11-02 19:51:41 EDT
Thanks Steffen.  Pushed 060575c318e69a2e523e43d7631e74f33b98e8b2
Comment 26 Steffen Pingel CLA 2011-11-08 12:28:57 EST
I replaced the org.junit dependency in the test bundle with a dependency on org.junit4 and added org.apache.lucene.core to the target platform. David, have you had a chance to test this on Eclipse 3.5? Eclipse 3.5 and 3.6 include Lucene 1.9.1 so we should verify that shipping Lucene 2.9.1 won't cause any conflicts.

Currently, I am getting compile errors on 3.5. Can you take a look (you can use the mylyn-dev-base target)?
Comment 27 David Green CLA 2011-11-08 16:41:11 EST
(In reply to comment #26)

I haven't tested on 3.5 or 3.6.  I'll take a look at it with the mylyn-dev-base target platform.  

> ...Eclipse 3.5 and 3.6 include Lucene 1.9.1 so we should verify that shipping Lucene 2.9.1 won't cause any conflicts.

I'm not sure what other Eclipse components use Lucene - do we need to test other things, or just verify that Lucene 2.9.1 can install/run properly?
Alternatively we could look at lowering the Lucene version constraint for the index bundle.
Comment 28 David Green CLA 2011-11-09 00:49:25 EST
I've got it building on Eclipse 3.5 and after some testing it appears to be working correctly.  I haven't verified that there are no conflicts with Lucene versions.

I've changed the location of the task list index on the filesystem to be located as a sub-folder in the same directory as the task list itself (".taskListIndex").  This change was made to avoid using API in the @Bundle@ class that's not available on Eclipse 3.5.
Comment 29 Lucas Panjer CLA 2011-11-09 18:13:12 EST
David, 

Can you add the task repository to the set of fields that are indexed?
Comment 30 David Green CLA 2011-11-09 19:05:39 EST
Done.  Take a look at @org.eclipse.mylyn.internal.tasks.index.tests.TaskListIndexTest.testMatchesOnRepositoryUrl()@ for an example of how to query on repository URL.
Comment 31 David Green CLA 2011-11-09 19:05:44 EST
Created attachment 206745 [details]
mylyn/context/zip
Comment 32 Steffen Pingel CLA 2011-11-18 09:42:12 EST
I'm seeing this error logged in dev workspace. Any thoughts what might be causing that?

[SCR] Component definition XMLs not found in bundle org.eclipse.mylyn.tasks.index.core. The component header value is OSGI-INF/task_list_index.xml
Comment 33 David Green CLA 2011-11-18 14:47:44 EST
(In reply to comment #32)

It's related to an entry in the manifest:

bq. 
Service-Component:  OSGI-INF/task_list_index.xml

I've removed the entry.  It was originally put there with the intention of making the task list index an OSGI service -- however I abandoned that track in favor of following the existing pattern of having Task List UI manage the lifecycle and initialization of the index.
Comment 34 Steffen Pingel CLA 2012-01-23 14:52:35 EST
Thanks for your efforts, David. This looks pretty good! It's going to be great to have this in the Task List. 

I have posted a few suggestions and comments to this review: http://review.mylyn.org/#change,217,patchset=1 . Additionally, it'd be great if you could comment on these high-level questions:

* What happens if a task has changed and the index has not been updated?
* How are refactorings of repository URLs are handled?
* How is the case handled when the task data directory changes (TasksUiPlugin.reloadDataDirectory())?
* What is the concurrency model in TaskListIndex? It'd be great to have some documentation in the class how the locks are intended to be used.
Comment 35 David Green CLA 2012-01-23 19:49:59 EST
(In reply to comment #34)
> Thanks for your efforts, David. This looks pretty good! It's going to be great
> to have this in the Task List.

No problem... I can't immagine working without it now!

> I have posted a few suggestions and comments to this review:
> http://review.mylyn.org/#change,217,patchset=1 .

I'll take a look

> * What happens if a task has changed and the index has not been updated?

The task list has a configurable delay before it updates, meaning that there is a period of time where the index will be out of date with respect to task changes.  The idea is that updates to the index can be "batched" for greater efficiency.
Additionally, it's possible for a task to be updated either before or after the index is added to the task list as a listener, thus opening the possibility of changes without updates to the index.

In either of these cases, the index can be out of date with respect to the current state of tasks.  If the index is used in such a state, the result could be either false matches, no match where there should be a match, or incorrect prioritization of index "hits".  This is similar to the behaviour of Internet search engines, in that they don't always reflect the exact content of the sites they index.

The index has the option of reindexing all tasks via API.  This will bring the index up to date and is useful for cases where it's known that the index may not be up to date.  In its current form this reindex operation can be triggered by the user by including "index:reset" in the search string.  Reindexing is potentially an expensive, IO intensive long-running operation.  With about 20,000 tasks in my task list and an SSD, reindexing takes about 90 seconds.

> * How are refactorings of repository URLs are handled?

Index entries are identified by the task handle identifier, which is built from the task repository URL.  Presumably if repository URLs are changed, events will be fired for each task affected.  These tasks will be added to the index however the old version of the task with the old URL will remain in the index.  

In this case the index should behave in the same was as described above (an out of date index) with possible false matches.  The solution would be to reindex the task list.  Is there an easy way to hook into the repository URL refactoring process?

> * How is the case handled when the task data directory changes
> (TasksUiPlugin.reloadDataDirectory())?

I'm not sure.  We may need some changes to handle this scenario.

> * What is the concurrency model in TaskListIndex? It'd be great to have some
> documentation in the class how the locks are intended to be used.

Agreed, the concurrency model is not well documented.  I'll have a look at fixing this.
Comment 36 Steffen Pingel CLA 2012-01-24 03:14:00 EST
(In reply to comment #35)
> > * What happens if a task has changed and the index has not been updated?
> 
> The task list has a configurable delay before it updates, meaning that there is
> a period of time where the index will be out of date with respect to task
> changes.  The idea is that updates to the index can be "batched" for greater
> efficiency.
> Additionally, it's possible for a task to be updated either before or after the
> index is added to the task list as a listener, thus opening the possibility of
> changes without updates to the index.
> 
> In either of these cases, the index can be out of date with respect to the
> current state of tasks.  If the index is used in such a state, the result could
> be either false matches, no match where there should be a match, or incorrect
> prioritization of index "hits".  

The paragraphs above are a great description of the TaskListIndex. Would you mind pasting that in the class header?

> This is similar to the behaviour of Internet
> search engines, in that they don't always reflect the exact content of the sites
> they index.

I don't quite agree with analogy since the task list search works over a local data structure and acts more as a filter than a search engine but that seems like a reasonable trade off. 

> The index has the option of reindexing all tasks via API.  This will bring the
> index up to date and is useful for cases where it's known that the index may not
> be up to date.  In its current form this reindex operation can be triggered by
> the user by including "index:reset" in the search string.  Reindexing is
> potentially an expensive, IO intensive long-running operation.  

Would it make sense to add this command to the content assist?

> With about
> 20,000 tasks in my task list and an SSD, reindexing takes about 90 seconds.

I saw similar performance on my system. We'll have to watch how indexing impacts performance on systems that have slower disks.

> > * How are refactorings of repository URLs are handled?
> 
> Index entries are identified by the task handle identifier, which is built from
> the task repository URL.  Presumably if repository URLs are changed, events will
> be fired for each task affected.  These tasks will be added to the index however
> the old version of the task with the old URL will remain in the index.
>
> In this case the index should behave in the same was as described above (an out
> of date index) with possible false matches.  The solution would be to reindex
> the task list.  Is there an easy way to hook into the repository URL refactoring
> process?

You would need to register an IRepositoryListener listener handle the repositoryUrlChanged() event. We should also add a test case since that's always a tricky edge case.
Comment 37 David Green CLA 2012-01-24 16:32:38 EST
I've posted a new changeset to the review http://review.mylyn.org/#change,217

(In reply to comment #36)
> ...
> The paragraphs above are a great description of the TaskListIndex. Would you
> mind pasting that in the class header?

Done
 
> > ...In its current form this reindex operation can be triggered by
> > the user by including "index:reset" in the search string.  Reindexing is
> > potentially an expensive, IO intensive long-running operation.
> 
> Would it make sense to add this command to the content assist?

Could be.  I just wasn't sure about this one - from a UI perspective it seems a little odd to have reindexing via content assist as the only option.  It would make more sense to me to provide some kind of button in the Mylyn Tasks advanced preferences area.  What do you think?

> > With about
> > 20,000 tasks in my task list and an SSD, reindexing takes about 90 seconds.
> 
> I saw similar performance on my system. We'll have to watch how indexing impacts
> performance on systems that have slower disks.

Note that this is a rare or one-time penalty.  Generally as tasks are updated, index updating is very fast and happens in the background.

> > > * How are refactorings of repository URLs are handled?
> > Is there an easy way to hook into the repository URL refactoring
> > process?
> 
> You would need to register an IRepositoryListener listener handle the
> repositoryUrlChanged() event. We should also add a test case since that's always
> a tricky edge case.

Good idea.  I haven't addressed this in the latest changeset.
Comment 38 Steffen Pingel CLA 2012-01-25 05:38:59 EST
(In reply to comment #37)
> > > ...In its current form this reindex operation can be triggered by
> > > the user by including "index:reset" in the search string.  Reindexing is
> > > potentially an expensive, IO intensive long-running operation.
> >
> > Would it make sense to add this command to the content assist?
> 
> Could be.  I just wasn't sure about this one - from a UI perspective it seems a
> little odd to have reindexing via content assist as the only option.  It would
> make more sense to me to provide some kind of button in the Mylyn Tasks advanced
> preferences area.  What do you think?

Great idea. We can file a new bug to provide that, e.g. via the task list view menu until we have other controls that would warrant a preference page.

Thanks for the updated patch set. I have added some comments on the review.
Comment 39 David Green CLA 2012-01-25 20:28:31 EST
I've provided a new changeset that addresses most of the issues, including a unit test for refactoring a repository url.

I've added some discussion inline in patch set 3, related to the enum.  

things that remain to be done:
* address the IndexField enum issue
* create a new task for the task list reindex menu item
Comment 40 Lucas Panjer CLA 2012-01-26 12:21:58 EST
David, can you provide access to the dataManager instance variable for use in subclasses?
Comment 41 Steffen Pingel CLA 2012-01-26 16:45:39 EST
(In reply to comment #39)
> I've added some discussion inline in patch set 3, related to the enum.

Those are very good points. As we discussed a possible solution would be to add an (non localized) indexKey to Field which to match the fieldName() of the existing enum.

I propose that we move the original code that is currently in the Incubator over to Tasks once the move review is approved and you can then commit the delta against the latest patch set.
Comment 42 David Green CLA 2012-01-26 18:43:02 EST
(In reply to comment #41)
> I propose that we move the original code that is currently in the Incubator over
> to Tasks once the move review is approved and you can then commit the delta
> against the latest patch set.

That sounds good.  I'll start a new Gerrit review that depends on the current one so that we can continue uninterrupted.
Comment 43 Steffen Pingel CLA 2012-02-01 20:46:34 EST
The review was declared successful! We can now move the code to Mylyn Tasks and delete it from Mylyn Incubator.
Comment 44 David Green CLA 2012-02-08 17:45:34 EST
I've merged the task list index bundles into master, and applied the following change: http://review.mylyn.org/#change,217

Lucas with these latest changes the TaskListIndex constructor signature has changed.

Steffen, from what I can see the remaining tasks are as follows:
# review and iterate on http://review.mylyn.org/#change,227 to address the enum issue
# add a menu item to the task list that will cause reindexing
# verify that the incubator build is working with bundles removed
# verify that the tasks build is working with bundles added, and that tests are run as part of the build
Did I miss anything?
Comment 45 Steffen Pingel CLA 2012-02-08 18:30:47 EST
That sounds good. I have opened bug 371017 to track the releng changes.
Comment 46 Steffen Pingel CLA 2012-02-11 09:44:53 EST
I committed a small change that causes the index to use the summary from the ITask rather than obtaining it from TaskData. This is to support filtering by hash id for the Gerrit connector. David, let me know if that change has any unintended side effects.
Comment 47 David Green CLA 2012-02-20 20:02:19 EST
Steffen, any chance of getting your eyes on http://review.mylyn.org/#change,227 ?
Comment 48 Steffen Pingel CLA 2012-02-22 05:53:32 EST
The proposed changes look good to me!
Comment 49 David Green CLA 2012-02-22 16:39:42 EST
I've merged/rebased/pushed this latest change.

Outstanding: 
* add a menu item to the task list that will cause reindexing (bug 371029)

Anyone using the @TaskListIndex@  and associated classes programatically should be aware that this latest change removed the enum @TaskListIndex.IndexField@ , and may need to change their code accordingly.
Comment 50 David Green CLA 2012-02-22 16:40:19 EST
Created attachment 211458 [details]
mylyn/context/zip
Comment 51 Steffen Pingel CLA 2012-03-02 18:07:11 EST
All outstanding tasks have been resolved. Thank you very much for implementing the index functionality. I'll add an entry to the New & Noteworthy.
Comment 52 Steffen Pingel CLA 2012-03-03 07:26:51 EST
Created attachment 212021 [details]
screenshot
Comment 53 Steffen Pingel CLA 2012-03-03 07:29:47 EST
Created attachment 212022 [details]
screenshot