Bug 415241 - [Markers] The Tasks View displays the same item multiple times if there are symbolic link duplicates
Summary: [Markers] The Tasks View displays the same item multiple times if there are s...
Status: NEW
Alias: None
Product: Platform
Classification: Eclipse Project
Component: IDE (show other bugs)
Version: 4.3   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Platform UI Triaged CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 209190
Blocks:
  Show dependency tree
 
Reported: 2013-08-16 11:58 EDT by Martin Oberhuber CLA
Modified: 2022-04-06 17:12 EDT (History)
16 users (show)

See Also:


Attachments
sample project (2.47 KB, application/octet-stream)
2013-08-19 02:56 EDT, Martin Oberhuber CLA
no flags Details
The Dialog that diplays new checkbox added. (87.06 KB, image/png)
2013-09-03 13:57 EDT, Lidia Gutu CLA
no flags Details
Updated the old patch for review and merge. (119.49 KB, image/png)
2019-12-01 13:34 EST, Lidia Popescu CLA
no flags Details
Sample for new options in Filters dialog of Tasks view (99.83 KB, image/png)
2019-12-01 13:40 EST, Lidia Popescu CLA
no flags Details
The current patch (738.22 KB, image/png)
2022-04-06 17:12 EDT, Karsten Thoms CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2013-08-16 11:58:23 EDT
Build ID: Eclipse 4.3 on Linux

CQ:WIND00419219

When a source file has symbolic links that make this file visible multiple times, the tasks view shows multiple identical entries for task tags like TODO or FIXME.

This makes it hard to properly use this view; in the C/C++ space, this is a real problem, sine many of our users use symbolic links to satisfy needs of their make system.

Expected behavior: 
The IDE should know which resources are actually duplicates due to symbolic links and show respective task tags only once. Note that since task tags are exclusively generated from source code, the context of the source is not relevant.

Design proposal:
For any resources that have the "symbolic link" flag in the resource system, determine whether there are another resources visible with the same name. If yes, determine the canonical path of all potential duplicates and show the respective task tags only once.
Comment 1 Martin Oberhuber CLA 2013-08-19 02:56:48 EDT
Created attachment 234516 [details]
sample project

Attached is a sample project to illustrate the problem. Extract this anywhere on a Linux system, then import the project from symlink_inc_test/proj .

Alternative Design Proposal:
============================

A simpler design would implement a ViewerFilter that computes the canonical path of _all_ items to be displayed and ensure that each item is displayed only once. Performance of this simple implementation should be OK since the list of items to be displayed is limited (100 items by default).

In fact an implementation could iteratively be improved in 3 steps:


1. Suppress duplicates by looking at all canonical paths
--------------------------------------------------------
Implement a ViewerFilter to compute the canonical path of each item to be displayed, put it into a HashSet and ensure that only unique paths are displayed. In case there are duplicates, the choice of item from those duplicates would be random (first one as per the ViewerSorter in place).

The feature could be enabled unconditionally by default on Linux, since Performance impact should be small and the benefit universal. Alternatively, it could be enabled / disabled with a global Preference.


2. [optional] Improve Performance of the algorithm
---------------------------------------
Considering only identical file names for computing the canonical path would be a Performance improvement that could be done in a second, optional step. The global Preference could choose between strategies for dealing with symbolic links (not consider, filter-identical-filenames, filter-all-duplicates). 

Doing this in a 2nd step only makes sense, because this performance improvement isn't necessary in case performance turns out to be good enough with step 1.


3. [optional] Choose the "best" to display from a list of duplicates
---------------------------------------------------------
Improving the filter to show only the "best" of all duplicates to be displayed would be harder, since the filtering likely couldn't be done on-the-fly any more. Criteria for finding the "best" could either look at the is-symbolic-link flag. Or, looking at the canonical path of an item that has multiple resource paths associated, they could pick the one with the longest matching suffix between canonical path and resource path.
Comment 2 Martin Oberhuber CLA 2013-08-19 02:58:56 EDT
PS:
Simply displaying the "is-symbolic-link" resource property in an additional column in the tasks view might be another way addressing the problem. It would at least _show_ what is a symbolic link, and allow filtering to show original files only. 

Note, though, that this approach wouldn't take duplicates into account which are introduced by a symbolic link onto a parent folder of resources. So I personally don't think that this approach would give much value.
Comment 3 Martin Oberhuber CLA 2013-08-19 03:19:32 EDT
It looks like implementation needs to consider the org.eclipse.ui.ide plugin:
http://git.eclipse.org/c/platform/eclipse.platform.ui.git

namely the TasksView:
http://git.eclipse.org/c/platform/eclipse.platform.ui.git/tree/bundles/org.eclipse.ui.ide/src/org/eclipse/ui/internal/views/markers/TasksView.java

and the TaskFilter:
http://git.eclipse.org/c/platform/eclipse.platform.ui.git/tree/bundles/org.eclipse.ui.ide/src/org/eclipse/ui/views/markers/internal/TaskFilter.java

potentially along with its UI:
http://git.eclipse.org/c/platform/eclipse.platform.ui.git/tree/bundles/org.eclipse.ui.ide/src/org/eclipse/ui/views/markers/internal/DialogTaskFilter.java

One question is, whether the feature should also apply to the Problems view; I personally think rather not, because for a build problem, the context is relevant (ie whether the same symbolic link duplicate is build from the context of folder A or folder B).
Comment 4 Lidia Gutu CLA 2013-09-03 13:50:41 EDT
I have posted my solution for this issue on master branch.

The patch 1:  https://git.eclipse.org/r/#/c/16068/
It creates a new checkbox on 'Configure Contents' Dialog, that is visible only when it is opened from Tasks View. (The Bookmarks view & Problems View share the same implementation for this).
The checkbox is checked by default.
If it is checked, the tasks view should display the tasks only ones, the duplicates from symbolic links are ignored. If a task is edited from its original source file or from one of its symbolic links, the Tasks View displays both, the new/updated task, and the old task that is referenced from one of other files (either the original source file, either it's symbolic link), till all files are refreshed.
Basically, when the whole project is refreshed, the duplicates are suppressed.

The patch 2: https://git.eclipse.org/r/#/c/16070/
This is a supplement for the solution. It creates a new column on TasksView, unvisible by default (So you should manually add if from 'Configure Columns' dialog) that displays from where the current task is displayed, from original source file of from one of its symbolic links. May be used as a guideline how to add a column.
Comment 5 Lidia Gutu CLA 2013-09-03 13:57:06 EDT
Created attachment 235115 [details]
The Dialog that diplays new checkbox added.

I have posted the print screen from the 'Configure Contents' dialog that displays the new checkbox.

Please review, test and add your comments regarding my solution. Thank you.
Comment 6 Martin Oberhuber CLA 2015-04-05 20:06:23 EDT
Ping, our patch for this has been sitting in Gerrit for 1.5 years without comment. Could this perhaps be considered now ?

Note that the implementation is not only relevant for duplicates due to symbolic links, but also for duplicates due to physically nested projects as per bug 245412 .
Comment 7 Lars Vogel CLA 2015-04-07 13:07:06 EDT
I'm not sure if we should have this additional option "Suppress duplicates due to file system links". Is there a scenario in which you want to have duplicates?

Sergey, Andrey what do you think?
Comment 8 Andrey Loskutov CLA 2015-04-07 13:37:24 EDT
(In reply to Lars Vogel from comment #7)
> I'm not sure if we should have this additional option "Suppress duplicates
> due to file system links". Is there a scenario in which you want to have
> duplicates?
> 
> Sergey, Andrey what do you think?

Few unsorted notes after checking the first patch. 

1) I have personal problem with Problems view - it is dog slow IF it shows more then 100 markers (bug 349869, on my plan after bug 460383). I'm pretty sure that using file.getCanonicalPath() as it is done in the patch will render performance to even more unusable state. I've heard that in 4.x resources API got some "native" link support, might be this approach will be faster/better.

2) So I have breefly checked the patch, it can't be rebased - it would be nice to rebase it on master and check the performance of that extra filter live, without "100 markers" rule.

3) I think the problem is Linux/CDT specific, since on Windows most people can't create links or must have admin rights to do so, and most Java projects don't use links. So I would avoid to make solution for 5% users default which might impact performance for the rest of 95% users without giving them any benefits, but will most likely have performance impact. Therefore I would not enable it per default on Windows, or not enable it per default at all, due the performance impact I expect. But I would better wait for the numbers first. My use case is problems view showing 10000 markers and more. (my workspace in the office contains > 15000 markers, platform UI repo is also not better). So if we implement it, it should be disabled by default IMHO.

4) The patch must also consider "virtual" links/folders from Eclipse resources API, not just file system links.

5) Patch does not have proper error handling, but I guess this was just the proof of concept.
Comment 9 Sergey Prigogin CLA 2015-04-07 16:38:12 EDT
(In reply to Lars Vogel from comment #7)
> I'm not sure if we should have this additional option "Suppress duplicates
> due to file system links". Is there a scenario in which you want to have
> duplicates?
> 
> Sergey, Andrey what do you think?

Suppressing markers on multiple resources mapping to the same physical file should be optional since I can easily imagine a scenario where such suppression would be confusing. The deduping should apply to all aliasing scenarios, not just to filesystem symlinks. Performance is paramount. No change should be accepted if it noticeably reduces performance.

The proposed patch has a major drawback that it doesn't apply to to all aliasing scenarios.
Comment 10 Martin Oberhuber CLA 2015-04-08 12:17:51 EDT
Hi Andrey and Sergey, 

thanks for your consideration - some comments from my side:

> resources API got some "native" link support, might be this approach will be
> faster/better.

Agreed, there might be ways improving performance by trying to avoid getCanonicalPath() where possible. For example, one could put file size and modtime into a HashMap and only use getCanonicalPath() for comparison when 2 items have the same hashcode. But in my experience, getCanonicalPath() has become pretty fast already with modern JREs so I'd propose measuring the performance hit before thinking about optimizations.

Big advantage of getCanonicalPath() is that it would work for all aliasing scenarios (physically nested project; virtual folder ; eclipse resource link). Whenever you input 2 different resource paths, and they end up with the same getLocation().getCanonicalPath() you have an alias so I'm not sure if Sergey's assesment that the current patch doesn't address this is accurate.

> 3) I think the problem is Linux/CDT specific, since on Windows most people
> can't create links or must have admin rights to do so, and most Java

As Sergey pointed out, the solution should not be specific to file system links but any other aliasing scenario (eg physically nested subproject, which is very common for Java / Maven from what I've heard).

> for the numbers first. My use case is problems view showing 10000 markers

Hmm, but what's the value for a user dealing with a view with 10k entries ? I personally like the current approach to simply stop adding entries when a certain threshold is reached.

> should be optional since I can easily imagine a scenario where such
> suppression would be confusing.

Yes, I agree and that's why the proposed patch adds a UI filter as a simple approach to that problem.

IMO in the ideal case, the view would be more tree-like (2 levels): the 1st level flat list would be restricted to unique occurrences (no aliases), and if an occurrence has aliases one could expand that node to get a list of all aliases. I'm not sure if there's an easy way making this happen in the current code though ... perhaps a "has aliases" decorator with a right-click action to list/navigate to the aliases ?
Comment 11 Sergey Prigogin CLA 2015-04-08 12:58:06 EDT
(In reply to Martin Oberhuber from comment #10)
> But in my experience, getCanonicalPath() has
> become pretty fast already with modern JREs so I'd propose measuring the
> performance hit before thinking about optimizations.

The performance of getCanonicalPath() is highly dependent of file system performance and on the length of the path since the method involves multiple file system accesses. For all practical purposes this method should be considered slow.
Comment 12 Andrey Loskutov CLA 2015-04-08 18:06:33 EDT
(In reply to Sergey Prigogin from comment #11)
> (In reply to Martin Oberhuber from comment #10)
> > But in my experience, getCanonicalPath() has
> > become pretty fast already with modern JREs so I'd propose measuring the
> > performance hit before thinking about optimizations.
> 
> The performance of getCanonicalPath() is highly dependent of file system
> performance and on the length of the path since the method involves multiple
> file system accesses. For all practical purposes this method should be
> considered slow.

<offtopic>
Just an anecdote: once one customer was complaining they couldn't use Eclipse anymore since they imported some project. They (CDT) project contained a single link to some network mounted folder (perfectly available via command line) and in they workflow UI needed to resolve file paths. Guess what... the code was using getCanonicalPath() and for whatever reason while resolving that specific file path (network address) UI just hung (forever?). Sure they network config was might be not usual, but outside of Eclipse everything was just OK. Since that incident I'm *very* cautious if I see getCanonicalPath() in UI code.
</offtopic>
Comment 13 Lidia Popescu CLA 2019-12-01 13:34:32 EST
Created attachment 280827 [details]
Updated the old patch for review and merge.
Comment 14 Lidia Popescu CLA 2019-12-01 13:40:03 EST
Created attachment 280828 [details]
Sample for new options in Filters dialog of Tasks view

For tests: 
- Created a symbolic link for a source file that has some // TODO ( I did it from Eclipse:Select Project -> New File -> Advanced ...) It usually happens on C/CPP Projects.
- Opened Tasks view.
- Open Filters from Tasks view
https://bugs.eclipse.org/bugs/attachment.cgi?id=235115
It should have checked by default new option "Suppress duplicates due to file systems links"
- If checked only one item should be visible in the Tasks list.
https://bugs.eclipse.org/bugs/attachment.cgi?id=280828
- If unchecked, at least 2 same TDDO entries should be visible from same source file in Tasks view.
https://bugs.eclipse.org/bugs/attachment.cgi?id=280827
Comment 15 Karsten Thoms CLA 2022-04-06 17:12:49 EDT
Created attachment 288405 [details]
The current patch

With the current patch I only see a single TODO, although the 'src' folder has been symlinked on the filesystem and the deduplication feature is disabled.