306214 – Unpredictable notification of resource events with multible threads

Bug 306214 - Unpredictable notification of resource events with multible threads

Summary: Unpredictable notification of resource events with multible threads

Status:	CLOSED DUPLICATE of bug 249951

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Resources (show other bugs)
Version:	3.5.1
Hardware:	PC Windows XP

Importance:	P3 normal with 2 votes (vote)
Target Milestone:	---
Assignee:	Platform-Resources-Inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:	249951
Blocks:
	Show dependency tree

Reported:	2010-03-17 11:28 EDT by Bernd Vogt
Modified:	2010-03-19 06:58 EDT (History)
CC List:	5 users (show)

See Also:

Attachments
JUnit test case that demonstrates the deadlock (8.72 KB, application/octet-stream) 2010-03-17 11:29 EDT, Bernd Vogt	no flags	Details
Thread dump showing the deadlock (5.04 KB, text/plain) 2010-03-17 11:31 EDT, Bernd Vogt	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Bernd Vogt

2010-03-17 11:28:30 EDT

Build Identifier: M20090917-0800

We are developing a grafical modeling tool for business logic (Visual Rules). Therefor we are using Eclipse, EMF, EMF Transaction and GEF.

Since some of our customers and in-house projects uses Subversive to share their Visual Rules projects, we are frequently faced with deadlocks. Some investigation has shown that the IWorkspace.run(...) methods don't behave as expected by my colleagues and me. Especially the IWorkspace.AVOID_UPDATE flag.

We have expected that if the IWorkspace.AVOID_UPDATE flag is specified all resource change notifications will be deferred until the end of the call (if no other thread modifies the workspace). That isn't so. If another Thread simultaneous executes another IWorkspace.run(...) call with null as scheduling rule (or a non conflicting rule) and this call ends before our call the other thread sends the resource notification although our first runnable is still performing. Note: The second runnable didn't made any resource modifications.

What causes the deadlocks? If the first runnable acquires a lock object (e.g. an EMF read-transaction) and there is a resource changed listener that want's to acquire the same lock, the behaviour described above can lead to deadlocks. For detailed information take a look at the attached unit test.

Why did that deadlocks appear with Subversive? Subversive heavily performs workspace runnables with null as scheduling rule. See org.eclipse.team.svn.core.operation.local.RefreshResourcesOperation.doRefresh(IResource, int). There is mutch more code in eclipse that runs such null-runnables. Just press Ctrl+Shift+G on IWorkspace.run(IWorkspaceRunnable, ISchedulingRule, int, IProgressMonitor).

I belive that this behaviour of the resource API is very critical for mulithreaded applications, because the notification of the resource events is unpredictable and unmanageable. For example, it is impossible to manage lock ordering if you are force to use more than one lock (e.g. a resource and an EMF lock).

Reproducible: Always

Steps to Reproduce:
1. Import and run attached test. (For debugging purpose increase the ResourceDeadlockTest#TIMEOUT_SECONDS and suspend de VM in the debugger)

Comment 1 Bernd Vogt

2010-03-17 11:29:42 EDT

Created attachment 162303 [details]
JUnit test case that demonstrates the deadlock

Comment 2 Bernd Vogt

2010-03-17 11:31:33 EDT

Created attachment 162305 [details]
Thread dump showing the deadlock

Comment 3 Szymon Brandys

2010-03-17 12:36:48 EDT

(In reply to comment #0)
> We have expected that if the IWorkspace.AVOID_UPDATE flag is specified all
> resource change notifications will be deferred until the end of the call (if no
> other thread modifies the workspace). That isn't so. If another Thread
> simultaneous executes another IWorkspace.run(...) call with null as scheduling
> rule (or a non conflicting rule) and this call ends before our call the other
> thread sends the resource notification although our first runnable is still
> performing. Note: The second runnable didn't made any resource modifications.

This looks like a duplicate of bug 249951. See also bug 249951, comment 6 and 7.

This looks like a bug, since the other runnable is not modifying the workspace. However as John wrote "there are other ways intermediate notifications could be triggered" and you are not guaranteed that when AVOID_UPDATE is used, notification will not be sent. So acquiring the same lock in IWorkspaceRunnable and a resource change listener looks like a bad idea.

Comment 4 John Arthorne

2010-03-17 13:49:42 EDT

Yes, you can never reliably predict what thread a given resource change notification will occur in. Because it is a multi-threaded system, you can't control the possibility that another thread will modify the workspace concurrently during execution of your IWorkspaceRunnable. 

It may be true that the other thread hasn't modified the workspace in this case, but since changes have occurred at this point and we don't track what thread each change occurred in, we detect that there is a change that listeners haven't been notified about, so we send the resource change event anyway.

In the end there are two possible event strategies here:

 1) Events are sent in the same thread where the change occurred. However, since changes can be interleaved across threads, the listener may receive change events out of order

 2) Events are sent from any thread, but events are always sent in the order in which the change occurred.

Both strategies have pros and cons, but in the end we chose event order consistency over event thread consistency.

Comment 5 Patrick Huy

2010-03-17 15:03:25 EDT

(In reply to comment #4)
> Yes, you can never reliably predict what thread a given resource change
> notification will occur in. Because it is a multi-threaded system, you can't
> control the possibility that another thread will modify the workspace
> concurrently during execution of your IWorkspaceRunnable. 

As far as I understand it no other Thread can modify the Workspace when a Workspace lock is acquired by the processing thread. If that Thread/Operation also requested AVOID_UPDATE I believe it makes sense for the notifications generated by that thread to be sent after it finishes.
The current behaviour makes it impossible to safely acquire any lock within a ResourceChanged listener. While it can probably be argued that a ResourceChanged listener should never lock anything it should be communicated much more aggresively.

Comment 6 James Blackburn

2010-03-17 16:48:13 EDT

(In reply to comment #5)
> As far as I understand it no other Thread can modify the Workspace when a
> Workspace lock is acquired by the processing thread

That's not quite true. The Workspace lock is released before running external code.  Unless you're using a scheduling rule, other jobs can run in parallel.

> The current behaviour
> makes it impossible to safely acquire any lock within a
> ResourceChanged listener. While it can probably be argued that a
> ResourceChanged listener should never lock anything it should be communicated
> much more aggresively.

This is a common issue with programming for Eclipse.  You most frequently see this with the UI lock - ISVs grabbing it from a resource change handler, and vice versa. And you're right: code hanging of a resource change handler should be very careful about what work it does as it potentially locks up Eclipse for user resource changes.

Integrators need to be careful that they always acquire locks in the same order to avoid any kind of deadlock, and this is no exception. If your code is called from a resource change handler, and acquires a lock, you need to make sure that you never lock resources while the lock is held, or you always hold a resource scheduling rule while you acquire your lock.

AVOID_UPDATE indicates that updates shouldn't be fired while the runnable is run, not what happens to events before or after the runnable.  
Isn't this just a bug, and NOT_ECLIPSE?

Comment 7 Bernd Vogt

2010-03-18 05:26:19 EDT

(In reply to comment #3)
> This looks like a duplicate of bug 249951. See also bug 249951, comment 6 and
> 7.

Yes, looks very similar. Randall Theobald describes in the description of bug 249951 some possible solutions. I think solution (b) could work for us, too (see possible workaround in the attached unit test).

(In reply to comment #3)
> [...] So acquiring the same lock in IWorkspaceRunnable
> and a resource change listener looks like a bad idea.

I agree, acquiring locks within the resource changed method shouldn’t be done thoughtless . But in my opinion integrators should have the chance to synchronize their own models with the eclipse resource state directly within the resource changed event notification. If you say it’s an bad idea to acquire a custom lock in the resource changed method, integrators are forced to spawn a new thread to acquire the lock and to process the resource changed events. But this means that for some time the integrators model is out of sync with the resource state, although all resource events has been sent. We tried this approach with the result that it is really painful and error-prone for other components that relies on our model to deal with this asynchronicity (additionally, writing unit tests was really pain in the a**).

(In reply to comment #4)
> In the end there are two possible event strategies here:
> 
>  1) Events are sent in the same thread where the change occurred. However,
> since changes can be interleaved across threads, the listener may receive
> change events out of order
> 
>  2) Events are sent from any thread, but events are always sent in the order in
> which the change occurred.
> 
> Both strategies have pros and cons, but in the end we chose event order
> consistency over event thread consistency.

I really agree with you. Event order consistency over thread consinstency! Maybe there is a misunderstanding with this issue. My problem is not the possibility that my resource events will be send by another thread but my problem is that there are resource event notifications while my resource runnable is performing although I have set the AVOID_UPDATE flag. This behavior means to me that it is impossible to safely acquire a lock within a resource changed listener and this means that I am not able to synchronize my model with the resource state within resource listeners. Finally, integrators have to decide to accept the danger of deadlocks or to deal with (in my opinion unnecessary) asynchronicity.

In my opinion a good solution could be 2) + to guarantee that there is no resource event notification as long as a resource runnable with the AVOID_UPDATE flag is performing. Regardless of threads and how many runnables are performing concurrently.

Comment 8 James Blackburn

2010-03-18 05:36:41 EDT

(In reply to comment #7)
> My problem is not the
> possibility that my resource events will be send by another thread but my
> problem is that there are resource event notifications while my resource
> runnable is performing although I have set the AVOID_UPDATE flag. 

The platform can't defer notifications indefinitely. Stopping any notification while any runnable is is running with AVOID_UPDATE would slow down resource updates to all other change listeners.

> This behavior
> means to me that it is impossible to safely acquire a lock within a resource

That's not true. If you ensure you acquire locks in the same order you'll be fine. Concretely: if you perform a resource changing operation while you hold your data structure lock, make sure you acquire the resource scheduling rule before acquiring your data structure lock.

Comment 9 Bernd Vogt

2010-03-18 06:35:53 EDT

(In reply to comment #8)
> That's not true. If you ensure you acquire locks in the same order you'll be
> fine. Concretely: if you perform a resource changing operation while you hold
> your data structure lock, make sure you acquire the resource scheduling rule
> before acquiring your data structure lock.

But exactly that is not possible. The problem therein is that while the operation making the resource changes holds the data structure lock the listener obviously cannot acquire it again. Even when running with a scheduling rule for the affected resources the listeners for those resources WILL be called while the operation holding the data structure lock is still running. 

Example:
Thread-1
Operation[WorkspaceRoot scheduling rule, AVOID_UPDATE]->Acquire data structure lock, modify a lot of resources which takes some time

Thread-2 (not managed by us, e.g. Subversive)
Operation['null' scheduling rule, AVOID_UPDATE or NOT]->do sth.-> end operation -> schedules event notification (changes created by Thread-1)

Thread-3 (NotificationManager$NotifyJob)
Notifies resource changed listeners while the Operation is running -> Try to acquire the data structure lock -> Listeners will block 

Thread-1
The running operation tries to modify further resources - This is not possible as the listeners are being invoked at the moment. -> Deadlock

This is exactly what the original unit test demonstrates. Lock orders in the test are as described by you.

Comment 10 James Blackburn

2010-03-18 07:16:06 EDT

(In reply to comment #9)
> But exactly that is not possible. The problem therein is that while the
> operation making the resource changes holds the data structure lock the
> listener obviously cannot acquire it again. Even when running with a scheduling
> rule for the affected resources the listeners for those resources WILL be
> called while the operation holding the data structure lock is still running. 

Oh dear, yes I see.  

As any future resource change is blocked on the resource delta notification, you almost want the notification delta to hold the workspace root sched. rule, or alternatively decouple the change notification from the workspace lock.

Comment 11 Bernd Vogt

2010-03-18 09:49:56 EDT

(In reply to comment #10)
> As any future resource change is blocked on the resource delta notification,
> you almost want the notification delta to hold the workspace root sched. rule,

I'm not sure if I understood you correctly. According to my understanding it is currently so that any further resource change gets blocked while another thread is sending some resource events. This feels as if the sending thread runs with a workspace scheduling rule. I think this is a really good thing and shouldn’t be changed.

> or alternatively decouple the change notification from the workspace lock.

What do you mean with decoupling change notification from the resource lock? According to my comment #7 our experience has shown that it is very unpracmatic to decouple the event notification from the actual event processing.

I really prefer the solution that events can be sent from any thread with attention to the event order (what the framework currently dose) but to guarantee that as long as a runnable with the AVOID_UPDATE flag is performing no resource event notification occurres.

Comment 12 Pawel Pogorzelski

2010-03-18 10:09:16 EDT

(In reply to comment #11)
> I really prefer the solution that events can be sent from any thread with
> attention to the event order (what the framework currently dose) but to
> guarantee that as long as a runnable with the AVOID_UPDATE flag is performing
> no resource event notification occurres.

That's one point of view. The second are components that relay on deltas coming more less immediately after a change. Behavior you suggest makes really easy for a bad behaving citizen to freeze the workbench for a long time.

Comment 13 Bernd Vogt

2010-03-18 10:45:21 EDT

(In reply to comment #12)
Yes, I'm aware of this problem. But behaviour as it is makes it easy to freeze the workbench forever (even for good citizens). Worse it's very opaque and makes it impossible to prevent deadlocks in a pragmatic way.

Comment 14 Szymon Brandys

2010-03-18 12:47:48 EDT

(In reply to comment #12)
> That's one point of view. The second are components that relay on deltas coming
> more less immediately after a change. 

Adding markers for instance. See bug 42080 and its duplicates, to see other examples.

(In reply to comment #13)
> (In reply to comment #12)
> Yes, I'm aware of this problem. But behaviour as it is makes it easy to freeze
> the workbench forever (even for good citizens). 

They are not good citizens then.

> Worse it's very opaque and makes
> it impossible to prevent deadlocks in a pragmatic way.

What would make it more clear? If you know that AVOID_UPDATE does not prevent change events, acquiring locks as you described can cause a deadlock. I think it is obvious.

Comment 15 Patrick Huy

2010-03-18 13:08:33 EDT

(In reply to comment #14)
> What would make it more clear? If you know that AVOID_UPDATE does not prevent
> change events, acquiring locks as you described can cause a deadlock. I think
> it is obvious.

If that is the case the documentation of AVOID_UPDATE needs to be updated to actually reflect the truth. It currently states that notifications will not be sent while the operation is running and no changes are done in a different thread. Additionally it should be in the javadoc of the ResourceChangedListener  that acquiring locks also acquired outside the listener WILL cause deadlocks (sooner or later).

Fixing AVOID_UPDATE to actually do what it's documentation states would allow for much simpler code in most cases. The current behaviour requires externally queuing the resource changes and acting upon them at a later time. This also needs to be synchronized with the resource api at given check points (any kind of user interaction). I don't see how it would hurt to behave as documented (and the documentation has been in place since at least Eclipse 3.2)

Comment 16 Bernd Vogt

2010-03-18 14:05:51 EDT

(In reply to comment #14)
> What would make it more clear? If you know that AVOID_UPDATE does not prevent
> change events, acquiring locks as you described can cause a deadlock. I think
> it is obvious.

If I know these facts its quiet clear that there is the possibility of deadlocks. But neither the doc of the runnable, nor the doc of the resource changed listener says sth. about it. I will only know these facts after I was faced with deadlocks and have done some time-consuming investigations.

But is the possibility of deadlocks or the related unpragmatic batching and processing of the events in a separate thread really necessary?

In my opinion, no.

According to comment #15 I'm convinced that it would be an advancement for eclipse if the AVOID_UPDATE flag behaves as described in the documentation.

Comment 17 Pawel Pogorzelski

2010-03-19 06:43:12 EDT

(In reply to comment #15)
> If that is the case the documentation of AVOID_UPDATE needs to be updated to
> actually reflect the truth. It currently states that notifications will not be
> sent while the operation is running and no changes are done in a different
> thread.

Even if the other runnable doesn't modify the workspace relaying on the flag is error prone. You can't make assumption on other threads. In your case deadlock would occur sooner or later even if we follow the spec exactly. This is not a critical bug then, lowering severity to normal.

> Fixing AVOID_UPDATE to actually do what it's documentation states would allow
> for much simpler code in most cases.

It's not an option as it breaks existing clients and makes the whole workspace less responsive and less robust, see bug 249951 comment 1.

Closing as a duplicate of bug 249951.

*** This bug has been marked as a duplicate of bug 249951 ***

Comment 18 Patrick Huy

2010-03-19 06:56:04 EDT

(In reply to comment #17)
> Even if the other runnable doesn't modify the workspace relaying on the flag is
> error prone. You can't make assumption on other threads. In your case deadlock
> would occur sooner or later even if we follow the spec exactly. This is not a
> critical bug then, lowering severity to normal.

How can resource events be triggered when the operation runs with the WorkspaceRoot scheduling rule?

Comment 19 Pawel Pogorzelski

2010-03-19 06:58:43 EDT

(In reply to comment #18)
> How can resource events be triggered when the operation runs with the
> WorkspaceRoot scheduling rule?

If the operation triggering a delta has a null rule. This is a case of adding/modifying markers for example.