Bug 292827 - Scheduling rule of initialize after load job causes a deadlock
Summary: Scheduling rule of initialize after load job causes a deadlock
Status: VERIFIED DUPLICATE of bug 289560
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.4.2   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.6 M6   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-10-20 17:38 EDT by Grant Taylor CLA
Modified: 2010-04-26 10:48 EDT (History)
6 users (show)

See Also:


Attachments
Java core file (1.71 MB, text/plain)
2009-10-22 16:33 EDT, Grant Taylor CLA
no flags Details
Another dump (26.60 KB, text/plain)
2010-01-21 07:11 EST, Markus Keller CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Grant Taylor CLA 2009-10-20 17:38:32 EDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3
Build Identifier: org.eclipse.jdt.ui_3.4.3.r342_v20090716.jar

I have a Java core file from an instance of our product that appeared to be hung to the user.  After analyzing the file, I believe this thread is at fault for causing the deadlock:

3XMTHREADINFO      "Worker-1" TID:0x492EC300, j9thread_t:0x48DAA460, state:CW, prio=5
3XMTHREADINFO1            (native thread ID:0x1C80, native priority:0x5, native policy:UNKNOWN)
4XESTACKTRACE          at java/lang/Object.wait(Native Method)
4XESTACKTRACE          at java/lang/Object.wait(Bytecode PC:3(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/jobs/ThreadJob.joinRun(Bytecode PC:277(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/jobs/ImplicitJobs.begin(Bytecode PC:207)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/JobManager.beginRule(Bytecode PC:16)
4XESTACKTRACE          at org/eclipse/core/internal/resources/WorkManager.checkIn(Bytecode PC:40)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.prepareOperation(Bytecode PC:50)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Project.touch(Bytecode PC:45)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/SetContainerOperation.executeOperation(Bytecode PC:456)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelOperation.run(Bytecode PC:43)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:82)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelOperation.runOperation(Bytecode PC:50)
4XESTACKTRACE          at org/eclipse/jdt/core/JavaCore.setClasspathContainer(Bytecode PC:30)
4XESTACKTRACE          at org/eclipse/jst/j2ee/internal/common/classpath/J2EEComponentClasspathContainer.install(Bytecode PC:93)
4XESTACKTRACE          at org/eclipse/jst/j2ee/internal/common/classpath/J2EEComponentClasspathInitializer.initialize(Bytecode PC:2)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager.initializeContainer(Bytecode PC:160)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager$12.run(Bytecode PC:131)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:82)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager.initializeAllContainers(Bytecode PC:313)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager.getClasspathContainer(Bytecode PC:21)
4XESTACKTRACE          at org/eclipse/jdt/core/JavaCore.initializeAfterLoad(Bytecode PC:83)
4XESTACKTRACE          at org/eclipse/jdt/internal/ui/InitializeAfterLoadJob$RealJob.run(Bytecode PC:20)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/Worker.run(Bytecode PC:31)

You can see from the above that the InitializeAfterLoadJob is running, and calling a JDT API that in turn tries to modify the workspace.  Looking at the code for the initialization job, I don't think it is specifying the workspace (or any portion of it) as its scheduling.  This is likely because it doesn't expect the JDT API to modify the workspace (see https://bugs.eclipse.org/bugs/show_bug.cgi?id=238179 for more details).  At any rate, a Job must declare all scheduling rules up front, otherwise deadlocks can exist in the job framework, which is what we are witnessing.

Reproducible: Sometimes

Steps to Reproduce:
Details should be enough to investigate this issue.  Problem cannot be reproduced with Eclipse alone.
Comment 1 Dani Megert CLA 2009-10-21 05:31:15 EDT
Please provide the whole stack otherwise we can't analyze the real cause of the deadlock.
Comment 2 Grant Taylor CLA 2009-10-21 08:25:14 EDT
Do you want the other thread stacks too?  Do you want the entire java core file?  I didn't upload it because I thought it might contain product-specific information, which I know is frowned upon.  Please let me know what you want and I'll upload it.  Note that even the full java core doesn't tell you everything because you can't see the scheduling rules for each Job.  You can only look at the code, make some guesses about what the scheduling rules are and look for Jobs running that break the rules.

That said, I think the stack I provided proves that something is wrong with the InitializeAfterLoadJob.  It is obviously [indirectly] trying to change the workspace, and I don't think the scheduling rule has the workspace in it.
Comment 3 Dani Megert CLA 2009-10-21 11:36:00 EDT
>Do you want the entire java core file?
That would be best.

>  I didn't upload it because I thought it might contain product-specific
>information,
Well, of course only you can decide that.
Comment 4 Grant Taylor CLA 2009-10-22 16:33:39 EDT
Created attachment 150307 [details]
Java core file

In the Java core, you'll see that the Worker-7 is also a victim of the JDT issue of it modifying the workspace unexpectedly.  I'm sure the Decoration job doesn't (and shouldn't) have the workspace as a scheduling rule.  Note that the JDT API being used here is getSourceFolders(), so we're really not expecting a workspace modification.

Worker-1 could have the same assumption as our code (and excuse).  However, this job is coming from JDT, so I would expect that it should know when workspace modifications will be made by certain API calls.
Comment 5 Jay Arthanareeswaran CLA 2009-11-23 06:16:56 EST
The fix should be same as the given in bug 289560. I am not sure whether it's technically a duplicate or not, since the steps are not given for this bug.

Grant, could you please try the fix given with bug 289560 comment #14 ?
Comment 6 Grant Taylor CLA 2009-11-23 09:06:13 EST
This issue was reported by an internal user of our product.  It's an intermittent problem, so we can't verify that the patch will actually fix the issue.  Looking at the patch, it seems like it should help, since the scheduling rule of the operation is relaxed.  However, can you verify that the InitializeAfterLoadJob has a rule that covers the rule of the operation?  If it doesn't, then I think there is still the possibility of hanging issues (although the chance is reduced with the patch).
Comment 7 Jay Arthanareeswaran CLA 2009-11-23 09:36:19 EST
(In reply to comment #6)
> This issue was reported by an internal user of our product.  It's an
> intermittent problem, so we can't verify that the patch will actually fix the
> issue.  Looking at the patch, it seems like it should help, since the
> scheduling rule of the operation is relaxed.  However, can you verify that the
> InitializeAfterLoadJob has a rule that covers the rule of the operation?  If it
> doesn't, then I think there is still the possibility of hanging issues
> (although the chance is reduced with the patch).

I am not sure if we can really come up with a rule for the InitializeAfterLoadJob. It makes sense to do it in SetContainerOperation because we know exactly which resources undergo a change and also because of the narrowed down scope. Besides, as the thread dump indicates, the problem seems to be with the SetContainerOperation.

Dani, what do you think?
Comment 8 Grant Taylor CLA 2009-11-23 10:07:18 EST
If a Job doesn't declare the resources it needs up front, doesn't this break the Job API?
Comment 9 Markus Keller CLA 2010-01-21 07:11:51 EST
Created attachment 156773 [details]
Another dump

I think I just ran into the same problem with I20100119-0800.

The main thread is waiting for 0x0ef5ce08, which is locked by Worker-10. That thread calls JavaElement.exists(..) which eventually tries to initialize a classpath container.
Comment 10 Jay Arthanareeswaran CLA 2010-02-16 02:29:40 EST
Markus, the bug could have been handled by the fix for bug 289560. However, as Grant mentioned in comment # 6, there is still a chance of a problem since InitializeAfterLoadJob  doesn't declare any resources it might use.

If we are convinced that fix to bug 289560 is enough, we can close this bug.
Comment 11 Markus Keller CLA 2010-02-16 05:50:36 EST
> Markus, the bug could have been handled by the fix for bug 289560.

I had this problem a few times (but not all the time) when I restarted with the Breakpoints view focused. I think it only started to occur after the changes in the Breakpoints view (to use a lazy tree, etc.).

I haven't seen it in the last 2 weeks, and I've reproduced the scenario that sometimes failed before (worked fine now).
Comment 12 Jay Arthanareeswaran CLA 2010-04-19 03:27:10 EDT
There were No further reports on the occurrence of this bug. Hence convinced that this was fixed along with bug 289560.

*** This bug has been marked as a duplicate of bug 289560 ***
Comment 13 Satyam Kandula CLA 2010-04-26 10:38:11 EDT
Verified for 3.6M7 using build I20100424-2000
Comment 14 Jay Arthanareeswaran CLA 2010-04-26 10:48:52 EDT
Verified.