Bug 253912 - Common IResource for all Aliased Workspace Resources
Summary: Common IResource for all Aliased Workspace Resources
Status: RESOLVED WORKSFORME
Alias: None
Product: e4
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: unspecified   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard: stalebug
Keywords:
Depends on: 198291
Blocks:
  Show dependency tree
 
Reported: 2008-11-05 09:06 EST by James Blackburn CLA
Modified: 2019-06-05 07:41 EDT (History)
5 users (show)

See Also:


Attachments
Test1 (2.33 KB, patch)
2009-03-05 10:26 EST, James Blackburn CLA
no flags Details | Diff
Test2 (2.68 KB, patch)
2009-03-06 05:11 EST, James Blackburn CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description James Blackburn CLA 2008-11-05 09:06:05 EST
Build ID: 3.4

Steps To Reproduce:
As was recently discussed on the e4 dev list, there are some circumstances where you want to attribute metadata to all IResources corresponding to a particular location (or more succinctly, treat a IFileStore location as a single IResource) e.g.
  - Breakpoing markers
  - Local File history
  - Opening editors
  - Team support
  - Resource deltas

The major problem I keep running into is the use of IWorkspaceRoot.find{Files,Containers}ForLocation(IPath)[0]...  
In many cases the developer doesn't care -- or doesn't want to care -- which IResource they're operating on.  They're often not interested in the workspace relative path.

In cases like this it would be good if the platform did the "right-thing" by default.

One solution might be to have all IResources (which exist in the resource tree) be lightweight proxys to a single master-IResource.  There would be one master-IResource per IFileStore location.

All metadata would be stored on the master-IResource, with an attribute indicating which IResource the marker was set on, or none if the metadata should be accessible from all IResources which map to that location.  

Having done this, discovering the workspace resources for a location is cheap.  It would then also be easy for clients to list all metadata set against a particular filesystem location.

With these changes the following, which required altering the client code, would work out of the box:  
  - Setting and unsetting breakpoint on differen IResources poiting at the same location, in CDT.
  - Problem markers from build errors being annotated on the wrong source file in the workspace. 
In both cases the default behaviour would be to have the marker visible on all IResources.  CDT could then refine this to select an appropriate IResource if it so wished, but the results of not doing so would be better than what we currently have.

In bug 245412, masking out resources from overlapping projects would help alleviate some of the issues seen, but the problem still persists with linked resources.
Comment 1 John Arthorne CLA 2008-11-05 15:37:40 EST
This sounds like ideal work to explore in the e4 incubator (and possibly backport to a 3.x release if successful).
Comment 2 Martin Oberhuber CLA 2008-11-10 05:17:43 EST
When I understand this right, it is an implementation suggestion for some behavior that we agree we want to have.

In terms of behavior, I think everybody agrees that we want the metadata (markers, encoding, ...) pertaining to a single physical resource be available regardless of the path that identifies the resource. As an added bonus, it may be interesting to be able and filter metadata based on the context on which it was set (i.e. show me only the markers set in the context of the C-project), but none the less all the metadata must be available. I'm using bug 233939 for improvements to the alias manager that allow making the link between a physical resource and the various paths identifying that resource under all circumstances.

In terms of implementation, there are basically two possibilities:
  (a) store the metadata separately according to the context on which it was
      created, but merge it based on alias information when retrieving it; or,
  (b) store the metadata in a single place right away (as suggested here).

One problem for (b) is how to compute the canonical resource for a path (and where to store it), but the biggest problem I see is the lifecycle. 

(1) Assume that /foo/file.txt is a symbolic link (on the file system) pointing to /bar/file.txt. Some metadata is stored on the resource in the context of /foo, but persisted in a "common location". Now, Eclipse is shut down and the symbolic link is removed (or replaced by an actual file) outside Eclipse. Restart Eclipse. The alias is gone, so the metadata associated with the common resource is no longer accessible.

(2) Note that similar issues can happen with approach (a) as well. Assume that /foo/file.txt and /bar/file.txt are both aliases to the same file. A breakpoint is set in the context of /foo, stored in the context of /foo but accessible in the context of /bar as well. Now project /foo is closed. The breakpoint must still be accessible in the context of /bar. Where is the data actually stored?

I'm not sure how to best address this. For (1), we'd get notified of the modified physical file structure during the Refresh when Eclipse is re-started. We could update the metadata associations as part of this refresh ("/foo/file.txt was linked to /bar/file.txt before, now it's no longer linked, so clone the metadata now"). For (2), we'd likely be cloning all metadata right away and store it in multiple places.

Suggestions and ideas are welcome. I like the idea of avoiding clones in memory by maintaining a single canonical resource in memory only, but there will likely be some roadblocks to overcome.
Comment 3 John Arthorne CLA 2008-11-10 09:34:58 EST
Before focusing too much on the implementation, what would the API look like for this? There is some metadata that would make sense sharing across multiple IResources at the same location, but other metadata that shouldn't be shared. For example build problem markers typically only make sense in the context of a single project's build path. If the same file was linked in another project with different build class path or make files, it would likely have different build problems. So, would clients be responsible for specifying when they create a marker or property whether it should be shared with aliases? What would be the default behaviour for clients of the existing API?

If other markers such as breakpoints and tasks were added to all aliases, would the user see all copies of the breakpoint/task in the Breakpoints/Tasks view, or only one? How would this filtering of duplicates be accomplished?
Comment 4 James Blackburn CLA 2009-03-05 06:41:52 EST
Another issue is that of scheduling rules. It looks possible, to me, to 'cheat' the Resource locking mechanism.  Resource#contains(...) & Resource#isConflicting(...) only check the workspace path.  This means that a resource which is aliased can be modified simultaneously by different jobs.

> In terms of behavior, I think everybody agrees that we want the metadata
> (markers, encoding, ...) pertaining to a single physical resource be available
> regardless of the path that identifies the resource. As an added bonus, it may
> be interesting to be able and filter metadata based on the context on which it
> was set (i.e. show me only the markers set in the context of the C-project),

That's all I'm really after. I doesn't much matter whether the metadata is stored together or apart, as long as the view is consistent from the API user's POV.

John has a very good point - it's probably best to try to work out what changes could be made to the API to fix this.
Comment 5 John Arthorne CLA 2009-03-05 09:52:17 EST
> Resource#contains(...) &
> Resource#isConflicting(...) only check the workspace path.  This means that a
> resource which is aliased can be modified simultaneously by different jobs.

I'm not sure if you're talking about the present code, or a hypothetical future where a single ResourceInfo is shared by multiple resource paths. Currently, each "aliased" resource has a distinct workspace path, so it is not currently possible for a resource to be modified simultaneously by multiple jobs.
Comment 6 James Blackburn CLA 2009-03-05 10:26:59 EST
Created attachment 127661 [details]
Test1

(In reply to comment #5)
> > Resource#contains(...) &
> > Resource#isConflicting(...) only check the workspace path.  This means that a
> > resource which is aliased can be modified simultaneously by different jobs.
> 
> I'm not sure if you're talking about the present code, or a hypothetical future
> where a single ResourceInfo is shared by multiple resource paths. Currently,
> each "aliased" resource has a distinct workspace path, so it is not currently
> possible for a resource to be modified simultaneously by multiple jobs.

I am talking about the present code.  As stated in the On The Job documentation, resource scheduling rules should prevent concurrent modification of the same resource from multiple threads. By resource I'm extrapolating from the lightweight IResource to the underlying EFS filestore.

Given that each resource has a distinct workspace path, it *is* possible (isn't it?) for the resource to be modified concurrently from two different threads...

Attached is a test which runs two jobs each with a different resource scheduling rule (which happens to alias to the same location). Both jobs run concurrently...
Comment 7 James Blackburn CLA 2009-03-05 10:32:28 EST
(In reply to comment #6)
> for the resource to be modified concurrently from two different threads...

ugh. 'resource' is overloaded.  

What I'm trying to point out is that the scheduling rule for an IResource does not 'conflict with' or 'contain' the scheduling rule for the IResource's aliases. This means, using the IResource API, you can modify the same filestore concurrently by using different aliases to the fs location.
Comment 8 John Arthorne CLA 2009-03-05 13:01:43 EST
Ah, I see. Absolutely, the IResource scheduling rules only apply to the metadata in the resource tree. Clients can always access the local file system using java.io.File, etc, so protecting the actual file system resource isn't possible.
Comment 9 James Blackburn CLA 2009-03-05 13:50:16 EST
(In reply to comment #8)
> Ah, I see. Absolutely, the IResource scheduling rules only apply to the
> metadata in the resource tree. Clients can always access the local file system
> using java.io.File, etc, so protecting the actual file system resource isn't
> possible.

But I view the described behaviour of the Eclipse IResource API as a bug.  Yes the user can go to java.io.* But in the general EFS case, the integrator _must_ use IResource.  Are you saying that you think the current behaviour is correct, or is it just that no one has committed to improving it?

In a unified world, it's clearly possible to have a scheduling rule contain and use the location URIs of its child IResources.
Comment 10 John Arthorne CLA 2009-03-05 14:57:22 EST
I don't see it as a bug. The workspace tree is a pure tree data structure where each node has only one parent. Some resources happen to share the same file system location, but the file system is not what's being locked here.

More importantly, I don't think you could implement those semantics even if you wanted to. This would mean the result of IResource#isConflicting would change according to the state of the resource at any given moment. Thus the rule is not consistent as required by the ISchedulingRule contract, which will cause all kinds of problems in the job scheduling mechanism because you could end up with multiple threads running at once with the same rule (because the semantics of the rule changed before the start of threads T1 and T2).
Comment 11 James Blackburn CLA 2009-03-05 15:33:03 EST
(In reply to comment #10)
> I don't see it as a bug. The workspace tree is a pure tree data structure where
> each node has only one parent. Some resources happen to share the same file
> system location, but the file system is not what's being locked here.

If that's true why do you lock the tree of resources below a locked IResource, thereby preventing modification of filesystem children?  If you only cared about the individual IResource data structure, you would simply lock that node in the tree...  

Fundamentally IResource's proxy real resources in a filesystem.  IResource is the lowest common denominator used by all integrators be it JDT or CDT or some esoteric plugin which uses the API to operate safely on external resource objects.

As I see it, you've said "this isn't IResource's job". But it's not anyone else's either.   With e4's support for flexible resources as a first class citizen, I think it's a big issue not being able to exclusively lock a fs resource to prevent concurrent modification from other (safe) users of the IResource API.

> More importantly, I don't think you could implement those semantics even if you
> wanted to. This would mean the result of IResource#isConflicting would change
> according to the state of the resource at any given moment. Thus the rule is
> not consistent as required by the ISchedulingRule contract, which will cause
> all kinds of problems in the job scheduling mechanism because you could end up
> with multiple threads running at once with the same rule (because the semantics
> of the rule changed before the start of threads T1 and T2).

Surely that's not true. We don't currently mandate that Workspace remains in sync with the EFS provided filesystem.  To be clear, you would only need to use the IResource's URI & the URI of logical (aliased) resources contained beneath the IResource (in the case of an IContainer) instead of the resource's workspace fullpath as used today.

You could achieve something similar today by creating a multi-rule  from the IResources returned by find{Files|Containers}ForLocation()... This wouldn't prevent users from adding a new logical resource pointing to the same location uri into the workspace tree, but it would prevent two jobs from concurrently operating on the same fs resource.

In the old world, locking an IResource tree locked the corresponding filesystem tree, because the workspace tree mirrored the filesystem tree.  With flexible resources it's important to get the semantics of scheduling rules right otherwise we take a step backwards in functionality / safety / integrator's-expectation of the API...


Serge, as an advocate of flexible resources, do you have any thoughts on this?
Comment 12 Serge Beauchamp CLA 2009-03-05 15:59:20 EST
(In reply to comment #11)
> (In reply to comment #10)
> 
> You could achieve something similar today by creating a multi-rule  from the
> IResources returned by find{Files|Containers}ForLocation()... This wouldn't
> prevent users from adding a new logical resource pointing to the same location
> uri into the workspace tree, but it would prevent two jobs from concurrently
> operating on the same fs resource.
> 
> In the old world, locking an IResource tree locked the corresponding filesystem
> tree, because the workspace tree mirrored the filesystem tree.  With flexible
> resources it's important to get the semantics of scheduling rules right
> otherwise we take a step backwards in functionality / safety /
> integrator's-expectation of the API...
> 
> 
> Serge, as an advocate of flexible resources, do you have any thoughts on this?
> 

I understand the requirement of providing a common object that has a single set of metadata but shares different IResource in the workspace.

I haven't though of how this feature can be delivered within the current resource API and design constraints, especially than at the moment, each linked resource pointing to the same original all have their own metadata information.

That sounds to me that something like a IResourceAggregator that would collect all the IResource common to a physical location, and would multiplex the metadata calls for all the underlying resources.  

Since such aggregator would need explicit API usage, code using IResources could not be transparently converted by using aggrerators, since their logical workspace location  is ambiguous (It doesn't make sense to call agrgregator.move() for instance).

I feel like someone should do a experiment showing how this could be done and used while maintaining compatibility.
Comment 13 John Arthorne CLA 2009-03-05 23:58:15 EST
>Surely that's not true. We don't currently mandate that Workspace remains in
>sync with the EFS provided filesystem.  To be clear, you would only need to use
>the IResource's URI & the URI of logical (aliased) resources contained beneath
>the IResource (in the case of an IContainer) instead of the resource's
>workspace fullpath as used today.

Exactly, the workspace and filesystem are only occasionally in sync. So, using locks on the resource model to protect the file system is futile. The moment the mapping between workspace and file system changes in any way, you either need to change the lock semantics (which breaks the contract of scheduling rules), or you are no longer protecting what you think you are.

>You could achieve something similar today by creating a multi-rule  from the
>IResources returned by find{Files|Containers}ForLocation()... This wouldn't
>prevent users from adding a new logical resource pointing to the same location
>uri into the workspace tree, but it would prevent two jobs from concurrently
>operating on the same fs resource.

No it wouldn't. As I've said the file system can be modified by java.io.File, it can be modified directly by a client using EFS (which anyone is free to use), and it can be modified concurrently by other processes. Also as you've said, a new resource could be created concurrently that links an overlapping region of the file system. You just can't lock the file system with locks in the resource tree.

>In the old world, locking an IResource tree locked the corresponding filesystem
>tree, because the workspace tree mirrored the filesystem tree.  With flexible
>resources it's important to get the semantics of scheduling rules right
>otherwise we take a step backwards in functionality / safety /
>integrator's-expectation of the API...

I'm not sure what old world you're referring to. Before Eclipse 2.1, this was true because we didn't have linked resources, but we also didn't have ISchedulingRule back then. There was a single global workspace lock and only one thread could modify the workspace at once. Once linked resources were introduced in Eclipse 2.1, the resource tree no longer mirrored the file system tree. There is nothing new in e4 in this respect.
Comment 14 James Blackburn CLA 2009-03-06 05:11:57 EST
Created attachment 127792 [details]
Test2

(In reply to comment #13)
I acknowledge what you've said in comment #13. Perhaps I was a bit overeager in my analogy: locking the IResource to lock the filesystem (though as you've said elsewhere, for most users you don't really believe external modifications happen & the sync issue isn't a problem).

But I think my point still stands w.r.t. operating on IResource trees in semantically safe manner. Consider this:

p1/some_linked_folder/<tree of resources>
p2/some_linked_folder/<tree of resources>

some_linked_folder is a linked resource to the same external URI.

Using IResource job1 can lock p1, job2 can lock p2, and both can modify the IResource tree under some_linked_folder. job1 might delete some files, job2 might create and modify some files. 

This is where the badness happens.  During each IResource operation: #delete, #move, #copy, #setContents does
 workspace.getAliasManager().updateAliases(...)
which will update the IResource aliases everywhere else in the workspace, potentially under the other job's feet!  

job1 and job2 both _believe_ they have locked their respective IResource trees preventing modification by other jobs until they perform #endRule.  The aliasManager breaks this assumption and the jobs can see IResources change between beginRule and endRule _without_ performing any IResource changing operations! (in the attached test, j1 creates a file which appears in the tree j2 holds the scheduling rule for).


Using linked (logical) resources extensively, the semantics of ISV operation interaction has changed significantly[1][2]. It's as if the resource scheduling rules are only locking the IResource itself and not the IResource's children.

Hence the idea of changing the way scheduling rules work: where we know about Aliases at start of an operation, the scheduling rule should contain those aliases.


[1]  http://www.eclipse.org/articles/Article-Concurrency/jobs-api.html:
"Resource locking to ensure multiple operations did not concurrently modify the same resource"
[2] http://help.eclipse.org/stable/topic/org.eclipse.platform.doc.isv/guide/resAdv_batching.htm
... "They also allow you to declare which part of the workspace is to be modified, so that other plug-ins can be locked out of changing the same part of the workspace." ...
... "This tells the workspace that all of the changes in the runnable are confined to myProject. Any requests by other threads to change myProject will be blocked until this runnable completes. Likewise, this call will block if some other thread is already modifying myProject. By specifying which part of the resource tree will be modified by the runnable, you are allowing other threads to continue modifying other portions of the workspace" ...

John you're one of the architects of this stuff which makes it very likely I'm wrong.  But I'm really flummoxed by this. Fundamentally Aliases allow circumvention of resource scheduling rules. I can modify parts of the workspace while holding un-related scheduling rules.  As I see it scheduling rules aren't used to keep the data structure safe (the WorkManager#lock() is used for this).  Rather the scheduling rules work to allow different operations to operate safely by defining in advance which IResources they will end up modifying.  Without aliases the IResource api provides ACID semantics.  With aliases it doesn't.
Comment 15 James Blackburn CLA 2009-03-06 05:35:34 EST
If you have a purely virtual workspace with lots of aliases, your Resource#isConflicting(ISchedulingRule) might as well be:

public boolean isConflicting(ISchedulingRule rule) {
	//must not schedule at same time as notification
	if (rule.getClass().equals(WorkManager.NotifyRule.class))
		return true;
	if (!(rule instanceof IResource))
		return false;
	IPath otherPath = ((IResource) rule).getFullPath();
	return path.equals(otherPath);
}


Comment 16 Eclipse Genie CLA 2019-04-06 13:54:53 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 17 Lars Vogel CLA 2019-06-05 07:41:40 EDT
This is a mass change to close all e4 bugs marked with "stalebug" whiteboard.

If this bug is still valid, please reopen and remove the "stalebug" keyword.