Bug 226264 - race condition in Workspace.isTreeLocked()/setTreeLocked()
Summary: race condition in Workspace.isTreeLocked()/setTreeLocked()
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.3.2   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: 3.4 RC1   Edit
Assignee: Szymon Brandys CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed, greatbug
Depends on:
Blocks:
 
Reported: 2008-04-09 07:12 EDT by Jan Sievers CLA
Modified: 2008-05-19 10:30 EDT (History)
4 users (show)

See Also:
john.arthorne: review+
Szymon.Brandys: review+


Attachments
Patch to org.eclipse.core.internal.resources.Workspace (3.85 KB, patch)
2008-04-11 11:50 EDT, Holger Oehm CLA
no flags Details | Diff
mylyn/context/zip (8.38 KB, application/octet-stream)
2008-04-11 11:50 EDT, Holger Oehm CLA
no flags Details
patch to move notification out of unprotectedDelete/Move (14.84 KB, patch)
2008-04-18 13:28 EDT, Holger Oehm CLA
no flags Details | Diff
mylyn/context/zip (20.44 KB, application/octet-stream)
2008-04-18 13:28 EDT, Holger Oehm CLA
no flags Details
Fix (9.86 KB, patch)
2008-05-06 10:59 EDT, Szymon Brandys CLA
no flags Details | Diff
Fix v02 (9.89 KB, patch)
2008-05-07 10:37 EDT, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Sievers CLA 2008-04-09 07:12:34 EDT
Build ID: M20080221-1800

Steps To Reproduce:
Two WorkspaceJobs in two worker threads:

1. Job (Thread 1) has the org.eclipse.core.internal.resources.WorkManager.lock and does a Project.delete() which calls Resource.delete().In that method the WorkManager.lock is temporarily released and deleteUnprotected() is executed w/o that lock.

2. The second Job (Thread 2) was waiting to acquire the Workmanager.lock and can obtain it now. In its Workspace.endOperation() it starts to broadcast changes and NotificationManager.notify() calls Workspace.setTreeLocked() (so its field Workspace.treeLocked contains now thread 2).

3. In the meantime in Thread 1 runs and does also a broadcast of the Project deletion, this is also done in the NotificationManager.notify() and it also calls setTreeLocked(), effectively setting the field Workspace.treeLocked to thread 1.

4. Now in Thread 2 all calls to Workspace.isTreeLocked() give the _wrong_ result false.

The latter leads to Listeners trying to use Workspace.run():
-> see bug 212951.

I believe this bug in the isTreeLocked/setTreeLocked implementation is the real cause for the deadlock reported in bug 212951.


More information:

Related bug: 29624 this is where the current behavior of isTreeLocked() is introduced.
Comment 1 Holger Oehm CLA 2008-04-11 11:50:41 EDT
Created attachment 95700 [details]
Patch to org.eclipse.core.internal.resources.Workspace

Hi,

I made a patch that fixes the race on Workspace.treeLocked. It would allow only one thread at a time to own the treeLock. Sending notifications from the NotificationManager.notify() method will be serialized by the patch, if notify() is called with lockTree==true.
The patch has been created with "diff -U5 Workspace.java.orig Workspace.java" against eclipse 3.3.2, buildId M20080221-1800, plug-in org.eclipse.core.resources (3.3.1.R33x_v20080205).

Best Regards,
Holger.
Comment 2 Holger Oehm CLA 2008-04-11 11:50:44 EDT
Created attachment 95701 [details]
mylyn/context/zip
Comment 3 Szymon Brandys CLA 2008-04-11 14:04:07 EDT
Thanks Holger. I will look at the patch. Is it only 3.3 issue? I'm just curious what is the performance impact when the patch is applied.
Comment 4 John Arthorne CLA 2008-04-11 15:38:48 EDT
I think the bug is that we're broadcasting PRE_DELETE from the unprotected code. Szymon, I suggest pulling these events out of unprotectedDelete and unprotectedMove to run before we release the lock and call these methods.  The above patch introduces a new kind of lock which creates a deadlock risk (thread A holding workManager.lock and waits on treeLockMonitor, thread B holding treeLockManager and waits for workManager.lock).
Comment 5 Holger Oehm CLA 2008-04-14 07:22:24 EDT
Hi John, 
I think there is actually no deadlock risk because a thread that holds the Workspace.treeLocked wont wait on the WorkManager.lock (because isTreeLocked() will return true, and WorkManager.checkIn() will fail before trying to acquire the lock.)
But indeed I just wasnt sure if the Workspace.treeLocked is guarded by the WorkManager.lock. If that is the case, moving the notification out of the unprotected part should be the better solution, I agree.
BTW: Maybe we could comment somehow the fact that the Workspace.treeLocked is guarded by the WorkManager.lock? Or even better: Could we make fail Workspace.setTreeLocked() if the WorkManager.lock is not held?
Best, Holger.
Comment 6 Szymon Brandys CLA 2008-04-14 08:07:08 EDT
I noticed that this problem concerns also the move operation. So the fix should touch both unprotectedDelete and unprotectedMove.

(In reply to comment #5)
> BTW: Maybe we could comment somehow the fact that the Workspace.treeLocked is
> guarded by the WorkManager.lock? Or even better: Could we make fail
> Workspace.setTreeLocked() if the WorkManager.lock is not held?

Workspace.setTreeLocked() should not be called when the thread doesn't hold the WorkManager.lock. Such an exception would help us to find such problems in the future quicker.

Comment 7 Holger Oehm CLA 2008-04-18 09:20:31 EDT
Would it help if I tried to provide that fix as a patch?
Comment 8 Szymon Brandys CLA 2008-04-18 10:07:06 EDT
Sure. I will appreciate it.
Comment 9 Holger Oehm CLA 2008-04-18 13:28:11 EDT
Created attachment 96625 [details]
patch to move notification out of unprotectedDelete/Move

The patch moves the broadcastEvent out of the unprotectedDelete() / unprotectedMove() methods in Resource. Also Workspace.setTreeLocked complains if someone wants to lock the tree without owning the WorkManager.lock.
Initially I wanted to throw an exception here, but in Project.close() line 138 there is a call to broadcastEvents _before_ the prepareOperation (this seems to be intentionally, i.e. with a source comment). So I didnt change that one, but only logged the wrong access to setTreeLocked() instead of throwing an exception.
I still believe that Project.close() is wrong, because in IResourceChangeEvent it is stated that the Workspace is closed during PRE_CLOSE events. And the ILifecycleEvent.PROJECT_PRE_CLOSE triggers an IResorceChangeEvent with type PRE_CLOSE in the NotificationManager.
The patch itself is done in org\eclipse\core\internal with "diff -U5" and contains changes to the five files:
events/NotificationManager.java
resources/Resource.java
resources/Workspace.java
utils/Messages.java
utils/messages.properties

Best regards, Holger.
Comment 10 Holger Oehm CLA 2008-04-18 13:28:14 EDT
Created attachment 96626 [details]
mylyn/context/zip
Comment 11 kiril mitov CLA 2008-04-24 04:07:25 EDT
Hi Szymon,
I am raising the priority of the problem since it might be the root cause of bug 212951 and it is important for us to have the deadlock in bug 212951 solved.

Best Regards,
Kiril
Comment 12 Szymon Brandys CLA 2008-04-24 05:56:45 EDT
Ok. I'm on it.
Comment 13 Szymon Brandys CLA 2008-04-28 06:15:57 EDT
(In reply to comment #9)
> The patch itself is done in org\eclipse\core\internal with "diff -U5" and
> contains changes to the five files:
> events/NotificationManager.java
> resources/Resource.java
> resources/Workspace.java
> utils/Messages.java
> utils/messages.properties

Why didn't you create the patch using Eclipse? 

Comment 14 Holger Oehm CLA 2008-04-28 07:11:03 EDT
Because it cannot create patches (it is the team provider who could) the only thing available to me was "apply patch", so I had to do it this way. (It would be nice to have this, though... :-)
Comment 15 Szymon Brandys CLA 2008-04-28 07:21:44 EDT
(In reply to comment #14)
> Because it cannot create patches (it is the team provider who could) the only
> thing available to me was "apply patch", so I had to do it this way. (It would
> be nice to have this, though... :-)
> 

I'm confused. You can't see "Create patch..." item? Is your org.eclipse.core.resources project shared with the Eclipse cvs?

Comment 16 Holger Oehm CLA 2008-04-28 07:30:44 EDT
Nope, I didnt see it. I imported my project from the target platform (In the plug-ins view "Import As" -> "Source project").
Comment 17 Szymon Brandys CLA 2008-04-28 07:41:58 EDT
(In reply to comment #16)
> Nope, I didnt see it. I imported my project from the target platform (In the
> plug-ins view "Import As" -> "Source project").
> 

I see. You can create a patch using Eclipse. You should add Eclipse cvs to the repos view (:pserver:dev.eclipse.org:/cvsroot/eclipse anonymous), then check out org.eclipse.core.resources. Now when the project is shared and you do some changes in it, you can create a patch against the HEAD in the repository.

It would help me, if you did it. And would be useful in the future, when you have more patches :-)

Comment 18 Holger Oehm CLA 2008-04-28 08:04:24 EDT
I am sorry, but I cannot access cvs from behind the firewall here.... :-(
Comment 19 Szymon Brandys CLA 2008-05-06 10:59:57 EDT
Created attachment 98850 [details]
Fix

Holger's fix tidied up with tests. I think that a runtime exception (assertion) is better to warn about the workspace lock issue.
Comment 20 Szymon Brandys CLA 2008-05-06 11:25:56 EDT
Adding John as RC1 committer/reviewer.
Comment 21 John Arthorne CLA 2008-05-07 10:37:17 EDT
Created attachment 99083 [details]
Fix v02

Two minor updates:
 - Added a sleep in the test case so that it consistently fails without the fix
 - There was an unnecessary arraycopy in Resource.broadcastPreDeleteEvent
Comment 22 John Arthorne CLA 2008-05-07 10:42:06 EDT
+1 on updated patch. Also need to add Holger to the copyrights.
Comment 23 John Arthorne CLA 2008-05-07 10:43:07 EDT
Adding the "greatbug" keyword... thank you Jan and Holger for the excellent analysis of the bug and the fix!
Comment 24 Szymon Brandys CLA 2008-05-07 11:40:22 EDT
Holger, any special requirements about your copyrights entry?
Is "Holger Oehm (holger.oehm@sap.com) - Bug 226264" ok?
	

 
Comment 25 Szymon Brandys CLA 2008-05-08 06:24:08 EDT
Slightly modified Fix v02 (+ copyright, + comments tidied up) release to HEAD. Thanks all for working on this bug.
Comment 26 Holger Oehm CLA 2008-05-13 07:14:29 EDT
Hi Szymon, yes, I think thats ok. I dont think there are any special requirements. I am glad I could help. Thank you for mentioning it.
Comment 27 Szymon Brandys CLA 2008-05-13 07:20:06 EDT
I hope to see you contributing to other core.resources stuff soon :-)
Comment 28 Szymon Brandys CLA 2008-05-19 10:30:54 EDT
Verified by code inspection.