Community
Participate
Working Groups
Created attachment 114401 [details] Thread Dump Build ID: I20080617-2000 The deadlock occurs multiple times in the manual and automated tests.Please find the thread dump in attachment. The deadlock is between worker-58 and ModalContext due to reverse lock/monitor order (org.eclipse.jdt.internal.core.JavaModelManager$PerProjectInfo + WorkManager lock).
I believe this is a the same problem as reported in bug 246136, i.e. "ModalContext" should never have reached this point since it takes a "project modification" scheduling rule which conflicts with the notification of resource changes. So there must be a misbehaving scheduling rule in the stack of "ModalContext". I'm not sure how to identify such misbehaving scheduling rule. Moving to Platform/Resources for further investigation.
It looks like that the notification of resource is done asynchronously in a separate job with scheduling rule <null>: org.eclipse.core.internal.events.NotificationManager$NotifyJob.run(org.eclipse.core.runtime.IProgressMonitor) line: 39
(In reply to comment #2) > It looks like that the notification of resource is done asynchronously in a > separate job with scheduling rule <null>: It's because the job is used just to trigger the notification.
(In reply to comment #1) > ... "ModalContext" should never have reached this point since it takes a "project > modification" scheduling rule which conflicts with the notification of resource > changes. So there must be a misbehaving scheduling rule in the stack of > "ModalContext"... then (accordingly to comment #3) there is no "misbehaving scheduling rule", this means also that there is no scheduling rule conflict between ModalContext and worker-58. ModalContext can run with workspace root as rule. Now the question is where is the root cause of this deadlock, in platform or JDT? P.S. I don't think scheduling rule <null> is a good idea. It will change the lock order (resource lock --> workManager lock) potentially, e.g. if someone tries to resource modification in the notification thread. - Is this allowed? - Does Eclipse have a general recommendation/guideline to avoid deadlock, at least for the common locks/monitors?
(In reply to comment #4) > Now the question is where is the root cause of this deadlock, in platform or > JDT? I am investigating, however there are some other candidates (wtp, com.sap.caf.designer?) > P.S. I don't think scheduling rule <null> is a good idea. It will change the > lock order (resource lock --> workManager lock) potentially, e.g. if someone > tries to resource modification in the notification thread. > - Is this allowed? No. See WorkManager#checkIn, each attempt to modify the resource tree during notification will fail. > - Does Eclipse have a general recommendation/guideline to avoid deadlock, at > least for the common locks/monitors? I think that Jerome is right and this issue is causes by misbehaving scheduling rule. To see how to build a correct rule, see the javadoc of ISchedulingRule.
This isn't the same as bug 246136. The thread performing the notification doesn't have any scheduling rule so there is no rule conflict here. The problem here is very simple. The lock on class JavaModelManager$PerProjectInfo is acquired during a resource change event by a listener. Another thread that owns this lock wants to modify the workspace, but we do not allow other threads to modify the workspace during resource change events. The *only* way the platform could prevent deadlock here is to allow the other thread to modify the workspace during the resource change event. This is not allowed by the IResourceChangeEvent contract and would likely cause breakage for many listeners. re comment #4 : yes, we prevent resource changes from occurring during resource change events (an exception is thrown if a listener tries to modify the workspace during an event). This prevents the inversion of the lock order that you mention, which would as you say cause deadlocks.
Moving to JDT Core.
There is no way for a client to know that the workspace modification is not allowed in a thread other than the notification thread. Even if there were such API (e.g. isTreeLocked() would work in all threads), this would not solve the problem since the notification could still occur between the isTreeLocked() check and the workspace modification. Back to Platform/Resources.
(In reply to comment #8) > There is no way for a client to know that the workspace modification is not > allowed in a thread other than the notification thread. That's why we should avoid synchronized blocks in resource change listener, or use them only if we are sure that client's code will not access them during notification. This is how I understand John's comment.
Jerome, can you describe what you expect us to do differently here? One thread is inside a resource change event, and another thread is trying to modify the workspace. Are you suggesting we should let the other thread modify the workspace during the event?
You cannot tell clients to NOT use such an important feature of the Java language. Without synchronized blocks, clients have no way to ensure the consistency of their data.
The thread that modifies the workspace made sure to use a scheduling rule before taking the PerProjectInfo lock. I would expect that the resource notification to not run when this rule is used.
> You cannot tell clients to NOT use such an important feature of the Java language. Anyone using this language feature needs to understand the risk of deadlock from using it. If you call third party code that isn't under your control from such a synchronized block, and you also acquire the same lock in a callback where you don't know the state of the calling thread, you are inherently deadlock prone. It happens in this case that your lock is interacting with a lock in platform resources, but a similar deadlock could occur between your lock and some other plugin that uses the same pattern. > The thread that modifies the workspace made sure to use a scheduling rule > before taking the PerProjectInfo lock. I would expect that the resource > notification to not run when this rule is used. This means no resource change events would occur during long-running operations. For example if you did a CVS checkout or full build that took several minutes, then no resource change events would occur during that time for any change. This has very strange behaviour - for example you would double-click in an editor ruler to add a breakpoint, and the breakpoint icon would only appear several minutes later when the change event ran. Incidentally platform resources has been concurrent like this since Eclipse 3.0, there is nothing changed here recently. I think this particular deadlock might be caused by the extra synchronization added to address bug 235778.
(In reply to comment #13) > Anyone using this language feature needs to understand the risk of deadlock > from using it. I'm perfectly aware of this risk. This is why I tried to acquire the Platform/Resources lock before acquiring my lock (by running with a scheduling rule). It looks like this is not the correct Platform/Resources lock. > If you call third party code that isn't under your control from > such a synchronized block, and you also acquire the same lock in a callback > where you don't know the state of the calling thread, you are inherently > deadlock prone. Isn't it exactly what Platform/Resources is doing by calling a resource listener inside the resource notification lock and also acquiring the same lock in another thread when the client tries to modify a resource? > It happens in this case that your lock is interacting with a > lock in platform resources, but a similar deadlock could occur between your > lock and some other plugin that uses the same pattern. If this Platform/Resources lock was available to clients, I would be able to acquire it before acquiring my lock. > This means no resource change events would occur during long-running > operations. For example if you did a CVS checkout or full build that took > several minutes, then no resource change events would occur during that time > for any change. This has very strange behaviour - for example you would > double-click in an editor ruler to add a breakpoint, and the breakpoint icon > would only appear several minutes later when the change event ran. OK, so maybe the scheduling rule is not the correct lock to acquire. But the client needs the lock that the resource notification takes. Then it is the responsibility of the client if this lock is taken for too long. > Incidentally platform resources has been concurrent like this since Eclipse > 3.0, there is nothing changed here recently. Sure, but that doesn't mean that the design was correct in 3.0 and that it should not be changed now. > I think this particular deadlock > might be caused by the extra synchronization added to address bug 235778. If you have another solution to this problem, I'll take it. But this won't solve this particular Platform/Resources issue.
It looks like there is no easy solution on Platform/Resources side. I entered bug 250311 though. In the meantime, we will workaround the issue in JDT/Core.
Created attachment 114697 [details] Proposed workaround
Workaround released for 3.5M3 (Note there is no regression test since this is a concurrency issue. To verify that the fix is in, one can only check the code)
Verified for 3.5M3 using I20081026-2000
We need the workarond downported to 3.4.1 and available also in 3.4.2. What is the procedure for requesting a patch?
3.4.1 is already out. Best we can do if we decide to backport the fix for 3.4.2 is for you to consume an early 3.4.2 build (they are occurring weekly already).
Note that you can also construct the backport patch and attach it to this bug.
What does it mean "you can also construct"? Can you explain me in more details what are the exact steps that I should do in order to have the fix in 3.4.2? (Sorry, I am a novice in this process)
Given Jerome attached a patch for 3.5, you can grap it and try to apply it to 3.4, then correct issues, and once you tested it, you can repost it back on this bug so it may be contributed to 3.4.2. Also read http://wiki.eclipse.org/JDT_Core_Committer_FAQ
Also note that this fix as ripples on clients. For example, this caused bug 251629. So bug 251629 would also need to be backported.
*** Bug 263094 has been marked as a duplicate of this bug. ***
Created attachment 124948 [details] Fix backported to R3_4_maintenance branch Products are hitting this deadlock, with no workaround. So this needs to be backported to 3.4.x
Fix released in R3_4_maintenance.
(In reply to comment #27) > Fix released in R3_4_maintenance. Will this be contained in SR2? I'm puzzled because I see that recent versions in the maintenance branch already say "3.4.3" (MANIFEST.MF and buildnotes) and "post 3.4.2 release" (../batch/messages.properties). Has it been decided, which version will be JDT/Core's contribution to 3.4.2? Would I run into any known risks, when adopting this fix into our branch? How about the other recent backport in bug 261510 (it seems less critical?)?
*** Bug 265920 has been marked as a duplicate of this bug. ***
(In reply to comment #27) > Fix released in R3_4_maintenance. > Hi Jerome, If this was release on 2009-02-06 should it also be in the build from 2009-02-11? Because this fix is not currently in JDT 3.4.4.v_894_R34x which comes with eclipse 3.4.2 build.
Hi, This issue still occurs with elipse with version : "3.4.3.R34x_v20081215-1030". Please look into it. Added new thread dump. Regards, Ashwani K Sharma
Created attachment 128715 [details] 3.4.3Thread Dump Thread dump for 3.4.3 release.
(In reply to comment #30) > If this was release on 2009-02-06 should it also be in the build from > 2009-02-11? Because this fix is not currently in JDT 3.4.4.v_894_R34x which > comes with eclipse 3.4.2 build. The fix is indeed in the R3_4_maintenance branch but it was not included in 3.4.2. There is no 3.4.3 planned, but if there were one, the fix would be in a 3.4.3 maintenance release. (In reply to comment #31) > This issue still occurs with elipse with version : "3.4.3.R34x_v20081215-1030". I'm not sure where this version come from, but I suspect you're looking at a plugin version which is different from the Eclipse version. Since there is no build containing this fix, it is expected that you still see the problem.
*** Bug 293821 has been marked as a duplicate of this bug. ***