Bug 208440 - add a custom search field for duplicate detection, based on summary or description
Summary: add a custom search field for duplicate detection, based on summary or descri...
Status: CLOSED MOVED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: unspecified   Edit
Hardware: Power PC Mac OS X - Carbon (unsup.)
: P4 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks:
 
Reported: 2007-11-01 11:24 EDT by Yvan BARTHÉLEMY CLA
Modified: 2013-10-24 14:48 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Yvan BARTHÉLEMY CLA 2007-11-01 11:24:50 EDT
Currently the duplicate detection only allows (at least on bugs.eclipse.org) stack trace detection. But it would be useful to perform a custom query or a query based on the information provided by the user in Title or Description for related or duplicate detection.
Comment 1 Mik Kersten CLA 2007-11-01 16:23:04 EDT
Yup, this would indeed be a nice feature to have .  Marking as helpwanted for now in case anyone is interested in contributing it.
Comment 2 Eugene Kuleshov CLA 2007-11-01 21:55:18 EDT
I'd suggest to add another hyperlink detector type that would open a standard search dialog and populate certain fields. 

Though it is unclear if such search will be helpful because summary/description of the same issue can vary a lot. Automatically provided text won't help much without using some kind of fuzzy search (which need to be supported by issue tracking system). Currently when searching for duplicates I am using Task List view (assuming that there is a query that contain all tasks in a given scope) and in some cases have to do try 5..12 variants in the quick find.
Comment 3 Yvan BARTHÉLEMY CLA 2007-11-02 13:27:25 EDT
I agree that a fully automated duplicate research based on title and description might not be useful.

However, it might from sometimes (where messages are similar, but do not generate stack trace).

That said, an assisted detection might be really useful, and a totally manual search (exactly the same as in Task Search in fact, but integrated in the form), would sure be useful, at least to me.
Comment 4 Eugene Kuleshov CLA 2007-11-02 20:28:54 EDT
Yvan, can you please clarify which connector you want this feature for?
Comment 5 Yvan BARTHÉLEMY CLA 2007-11-03 05:41:44 EDT
In fact I am currently using the Trac connector.
Comment 6 Eugene Kuleshov CLA 2007-11-03 16:58:06 EDT
Thanks Yvan.

Steffen, copying you, since it is related to Trac and you've done some work to generalize search framework at some point and this issue seem like another driver for that. I wonder if we should look at the options how to map repository task data (from the editor) to search criteria, so it can be reused by different connector UIs.
Comment 7 Mik Kersten CLA 2007-11-06 02:27:43 EST
(In reply to comment #2)
> I'd suggest to add another hyperlink detector type that would open a standard
> search dialog and populate certain fields.

I have a different take on how heuristic detectors should work on Mylyn.  This would vary depending on the duplicate detection approach used, but I think that it could be useful to several NLP type approaches:
1) Formulate the loosest query possible, e.g. "any" of the words in the summary.
2) Retrieve the first n results for such a query
3) Run an NLP tool or some other matcher on the contents of those bugs

The pattern here is that we can't extend the search of the repository, so need to pre-fetch the task data that's relevant to the duplicate detection.  There are other ways in which that can be limited down to a workable set (e.g. downloadable in a few seconds), e.g. limiting by product if applicable.
Comment 8 Eugene Kuleshov CLA 2007-11-06 03:19:00 EST
(In reply to comment #7)
> ... limiting by product if applicable.

BTW, "limiting by product" in the current stack trace detector is making it impossible to find any cross project issues.
Comment 9 Gail Murphy CLA 2007-11-08 18:43:12 EST
We are experimenting with some NLP approaches. The basic answer is that with taking text from the summary and description and massaging it, an NLP-based model can give you 65% recall on finding duplicates (well, I should explain that more precisely but won't unless asked to). Doing this in a performant way likely requires hosting a service.
Comment 10 Palmer Eldritch CLA 2013-10-24 14:48:56 EDT
I was intrigued by the possibility to post bugs without leaving eclipse but the lack of this feature makes this impossible. Had I submitted this via the eclipse > bugzilla > new bug I would have created a duplicate. Moreover when I read about "duplicate detection" I did spend (quite) some time trying to find out where I could search for duplicates - the "Duplicate search" item ONLY searches based on the stack trace - right ? Well trust me when I say that this is not the most intuitive thing about it (imagine that a typical user is not initially clear on what a connector is, for instance)
Please rename it to "stack trace search" as that's what it is - it needs much more work to become a "Duplicate search"

And please assign this one - when I fill a bug here in bugzilla as soon as I enter some words in the summary I get a list of possible duplicates - this should be implemented in the New Task editor - the "submit bug" is useless without this - only useful for error log reports (but see https://bugs.eclipse.org/bugs/show_bug.cgi?id=420108)

Should I fill a new bug (as I intended : Duplicate detection based on summary) ?
Comment 11 Eclipse Webmaster CLA 2022-11-15 11:45:08 EST
Mylyn has been restructured, and our issue tracking has moved to GitHub [1].

We are closing ~14K Bugzilla issues to give the new team a fresh start. If you feel that this issue is still relevant, please create a new one on GitHub.

[1] https://github.com/orgs/eclipse-mylyn