Bug 241751 - Using a ClasspathContainerInitializer requires the use of workspace lock
Summary: Using a ClasspathContainerInitializer requires the use of workspace lock
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.5   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.5 M2   Edit
Assignee: Jerome Lanneluc CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-22 17:27 EDT by Min Idzelis CLA
Modified: 2016-12-19 15:18 EST (History)
6 users (show)

See Also:


Attachments
JavaCore with stacks, etc (2.07 MB, text/plain)
2008-08-12 10:53 EDT, Min Idzelis CLA
no flags Details
another instance of what looks like the same problem (23.44 KB, text/plain)
2008-08-21 13:07 EDT, David Green CLA
no flags Details
Proposed fix (8.24 KB, patch)
2008-08-26 10:26 EDT, Jerome Lanneluc CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Min Idzelis CLA 2008-07-22 17:27:21 EDT
Build ID: Eclipse 3.4final

Steps To Reproduce:
I am seeing a deadlock which involves resolving a classpath. I have a situation were I need to make sure that I must not acquire the workspace lock. However, I have the need to resolve a classpath. During resolving of a classpath, a ClasspathContainerInitializer is invoked, and it uses the JavaCore#setClasspathContainer() as recommended in the javadoc. This eventually calls into the code at ChangeClasspathOperation#classpathChanged() which calls ProjectReferenceChange#updateProjectReferencesIfNecessary(). This is where the problem lock is acquired. updateProjectReferencesIfNecessary() WILL acquire the workspace lock: 

	// ensure that a sheduling rule is used so that the project description is not modified by another thread while we update it
			// see https://bugs.eclipse.org/bugs/show_bug.cgi?id=214981
			IWorkspace workspace = projectResource.getWorkspace();
			ISchedulingRule rule = workspace.getRuleFactory().modifyRule(projectResource); // sheduling rule for modifying the project
			workspace.run(runnable, rule, IWorkspace.AVOID_UPDATE, null);

This is not good because a lot of effort was taken by me, and JDT already to not acuire this lock. It's unfurnate that this lock needs to be acquired. Is there any way to execute this without a lock, or if it is required, maybe it can be executed asynchrounously? 

More information:
Comment 1 Min Idzelis CLA 2008-07-22 17:28:00 EDT
Some relevant stack...


3XMTHREADINFO      "Worker-7" TID:0x394D1B00, j9thread_t:0x39ABC750, state:B, prio=5
3XMTHREADINFO1            (native thread ID:0x1654, native priority:0x5, native policy:UNKNOWN)
4XESTACKTRACE          at java/lang/Object.wait(Native Method)
4XESTACKTRACE          at java/lang/Object.wait(Bytecode PC:3(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/jobs/ThreadJob.joinRun(Bytecode PC:3(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/jobs/ImplicitJobs.begin(Bytecode PC:207(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/jobs/JobManager.beginRule(Bytecode PC:16(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/resources/WorkManager.checkIn(Bytecode PC:40(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.prepareOperation(Bytecode PC:50(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:39(Compiled Code))
4XESTACKTRACE          at org/eclipse/jdt/internal/core/ProjectReferenceChange.updateProjectReferencesIfNecessary(Bytecode PC:100)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/ChangeClasspathOperation.classpathChanged(Bytecode PC:57)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/SetContainerOperation.executeOperation(Bytecode PC:451)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelOperation.run(Bytecode PC:43(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:82(Compiled Code))
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelOperation.runOperation(Bytecode PC:50)
4XESTACKTRACE          at org/eclipse/jdt/core/JavaCore.setClasspathContainer(Bytecode PC:30)
4XESTACKTRACE          at com/ibm/ws/ast/st/core/internal/runtime/ServerClasspathContainerInitializer.initialize(Bytecode PC:75)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager.initializeContainer(Bytecode PC:160)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager$13.run(Bytecode PC:131)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.run(Bytecode PC:82(Compiled Code))
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager.initializeAllContainers(Bytecode PC:313)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaModelManager.getClasspathContainer(Bytecode PC:21(Compiled Code))
4XESTACKTRACE          at org/eclipse/jdt/core/JavaCore.getClasspathContainer(Bytecode PC:7(Compiled Code))
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaProject.resolveClasspath(Bytecode PC:288)
4XESTACKTRACE          at org/eclipse/jdt/internal/core/JavaProject.getResolvedClasspath(Bytecode PC:76(Compiled Code))
4XESTACKTRACE          at org/eclipse/jst/jsp/core/taglib/ProjectDescription.ensureUpTodate(Bytecode PC:41(Compiled Code))
4XESTACKTRACE          at org/eclipse/jst/jsp/core/taglib/ProjectDescription.resolve(Bytecode PC:1(Compiled Code))
4XESTACKTRACE          at org/eclipse/jst/jsp/core/taglib/TaglibIndex.internalResolve(Bytecode PC:204(Compiled Code))
4XESTACKTRACE          at org/eclipse/jst/jsp/core/taglib/TaglibIndex.resolve(Bytecode PC:17(Compiled Code))
Comment 2 Dani Megert CLA 2008-08-12 02:55:35 EDT
This is a duplicate of bug 238370 (and most likely bug 236954).
Comment 3 Jerome Lanneluc CLA 2008-08-12 05:08:30 EDT
Min, can you please provide the stacks of the other threads involved in the deadlock? 

Also is the problem that the workspace lock is taken (i.e. IWorkspace.run(...) is called) or is the problem that a scheduling rule is taken?
Comment 4 Jerome Lanneluc CLA 2008-08-12 06:03:13 EDT
(In reply to comment #2)
> This is a duplicate of bug 238370 (and most likely bug 236954).
Actually this is a slightly different problem than bug 238370 (see bug 238370 comment 4)
Comment 5 Min Idzelis CLA 2008-08-12 10:53:00 EDT
Created attachment 109787 [details]
JavaCore with stacks, etc

I believe it is only the beginRule() (invoked by the workspace begin operation) that is causing the hang.
Comment 6 Jerome Lanneluc CLA 2008-08-12 12:29:06 EDT
Thanks for the thread dumps. 

However it looks like we must use a scheduling rule. According to the Javadoc for IProject.setDescription(...):

"The scheduling rule required for this operation depends on the AVOID_NATURE_CONFIG flag. If the flag is specified the IResourceRuleFactory.modifyRule is required; If the flag is not specified, the IWorkspaceRoot scheduling rule is required. "

So all I can offer is to avoid to run with a scheduling rule if we don't need to change the project description (it would be 99% of the cases on startup). For the remaining 1%, we must run with a scheduling rule as specified above.

Let me know if this is acceptable for you and I can provide you with a binary patch that you can test.
Comment 7 Min Idzelis CLA 2008-08-12 15:28:58 EDT
So, what you propose is to put an if around the try-catch block that invokes the workspacerunnable in ProjectReferenceChange#updateProjectReferencesIfNecessary() so that IF updating project references is NOT necessary, then it won't run any workspace op at all instead of what otherwise would have been a no-op workspace runnable. 

Sounds good in theory, but I'm nervous about the 1% case. When this happens, a hang results that can't be broken. You need to forcefully terminate the process which I consider unacceptable. 

Since JavaCore#setClasspathContainer() does not return any result (nor does any intermediate op, like ChangeClasspathOperation or ProjectReferenceChange) I think a better solution would be to spin off a job that performs the ProjectReferenceChange#updateProjectReferencesIfNecessary() or even the whole ChangeClasspathOperation. It doesn't seem that the javadoc puts any sort of restriction on this, so this might just work. Let me know what you think.
Comment 8 Jerome Lanneluc CLA 2008-08-13 05:04:58 EDT
Note that the 1% case is not a regression as the IProject scheduling rule was used pre 3.4.0 when running IProject#setDescription(). I'm not sure why you didn't see that before. Maybe because it is very rare. Or are you now running with the IWorkspaceRoot scheduling rule in another job, and you didn't do that before?

Spinning of a job to update the project references is a not possible as clients can assume that the project references are up-to-date after setting the classpath container.
Comment 9 Jerome Lanneluc CLA 2008-08-13 09:07:20 EDT
After more investigation, here is the analysis of what's happening:
1. The build job ("Worker-17") runs and thus it uses the IWorkspaceRoot scheduling rule
2. Inside the build job the PageTemplateBuilder waits for the ReferenceManager to finish indexing
3. The indexer (ReferenceProcessor) runs in job "Worker-7"
4. While running, it triggers a classpath resolution which causes a container to be initialized.
5. This container initialization uses an IProject scheduling rule.
6. Since the IWorkspaceRoot scheduling rule is already used by "Worker-17", "Worker-7" waits.
7. In the meantime, "Worker-17" is still waiting for "Worker-7" to finish.
=> deadlock
Comment 10 Jerome Lanneluc CLA 2008-08-19 08:35:49 EDT
After talking with Min, the problem was fixed on his side.
Closing
Comment 11 Min Idzelis CLA 2008-08-19 14:10:21 EDT
I have run into some problems with the IJobManager#transferRule() approach. Could you please create and send me that patch you were talking about in comment #6? 
Comment 12 Jerome Lanneluc CLA 2008-08-20 05:29:57 EDT
Sent binary path to Min.

However have you tried the other approach we talked about? I.e. forcing the cp initialization in the builder by calling getResolvedClasspath() on each Java project.
Comment 13 Min Idzelis CLA 2008-08-20 07:48:32 EDT
In doing some more thinking about it and I don't think it will work. You see - I would actually need to do call getResolvedClasspath "before" any client had acquired any scheduling rule. This is because if the client acquires a workspace rule (during build/etc) and then calls into my waitUntilDoneIndexing() method, resolving a classpath during that time will not prevent the indexer thread from deadlocking. That is because once the caller thread acquires the lock, and indexer starts trying to execute the workspace runnable with beginRule() it will block until it can get the lock. It doesn't matter if the calling thread resolves the classpath in the meantime. The indexer thread will still be waiting for that lock. During this wait, it doesn't do any other processing to know that the runnable "still needs" to be executed. 
Comment 14 David Green CLA 2008-08-21 13:07:33 EDT
Created attachment 110595 [details]
another instance of what looks like the same problem

looks like I've encountered the same problem.  If it's not the same issue can someone please let me know and I'll file a new bug.
Comment 15 Jason Sholl CLA 2008-08-21 15:57:39 EDT
I think the most recent stack trace is already fixed with bug 244086.  The fix is that getReferences in J2eeModuleVirtualComponent no longer calls back into JDT to load the classpath containers.
Comment 16 Jerome Lanneluc CLA 2008-08-22 05:00:43 EDT
(In reply to comment #14)
> Created an attachment (id=110595) [details]
> another instance of what looks like the same problem
> 
> looks like I've encountered the same problem.  If it's not the same issue can
> someone please let me know and I'll file a new bug.
> 
As you already found out I think, this is the same problem as reported in bug 243398.
Comment 17 Jerome Lanneluc CLA 2008-08-22 09:37:35 EDT
(In reply to comment #12)
> Sent binary path to Min.
Min, any news on the patch I sent you?
Comment 18 David Green CLA 2008-08-22 19:27:33 EDT
(In reply to comment #16)
> As you already found out I think, this is the same problem as reported in bug
> 243398.

thanks!
Comment 19 Min Idzelis CLA 2008-08-25 11:24:59 EDT
No news on the patch. But I think we're going to go in a different direction for this problem. We really need a 100% working solution for this problem, and like you said before your patch will not work all the time. It's sorta like sweeping some bugs under the rug ;-) 

We're going to modify our builder to check to see if the indexer is immediately available. If it is, proceed like normal, if not it will install a listener that will wait until the index is ready and then re-invoke the builder. 

Additionally, we're going to pursue some enhancements to the transferRule() function of JobManager, since I think that is the ultimate way to fix this sort of problem and I think more and more things are going to be hitting things like this in the future. 

So, I don't think you should change the JDT code if we're not going to be taking advantage of it - and since the fix isn't a complete-fix. 

But there is still something I would like to ask you to do. Could you please document (javadoc) the fact that a Project modify rule will be acquired during the execution of:

JavaCore#setClasspathContainer()
IJavaProject#getResolvedClasspath()

and other places (if there are) where it can occur. This will help developers be more aware of what the ISchedulingRule prerequisites are for calling these methods. 
Comment 20 Philipe Mulet CLA 2008-08-25 11:26:57 EDT
Discussing with Jerome, we think we may have a solution for tolerating this scenario, forcing a batch classpath init in a preautobuild notification, in addition to Jerome's suggestion for avoiding unnecessary runnable if no change is performed in project refs or external folder linked folders.

Note that the situation is not new, but rather an inherent weakness since day 1, which got surfaced by the fact some builder is now waiting on a background indexer, itself fighting for permission to run (Runnable) with builder. This limitation comes from the platform, and a better fix would likely be for the builder (PageTemplateBuilder) to avoid waiting on a background indexer which itself can run 3rd party code (via other registered CP initializers for instance). Our solution would avoid the JavaModel itself grabbing the offending lock, but nothing prevents any 3rd party initializer to again grab further locks. 

Why not simply running the indexer inside the builder thread ? At least for the portion of the indexing required during the build process ? (It could kill ongoing indexing job, and resume it once finished).
Comment 21 Philipe Mulet CLA 2008-08-25 11:30:00 EDT
Oops. I had missed comment 19 (stupid refresh issue).
Would be indeed great to avoid the builder wait on indexer as said in comment 20 already. Independently, the JavaModel could be improved to remove its inherent weakness (3.5 only?).
Comment 22 Jerome Lanneluc CLA 2008-08-25 11:52:52 EDT
(In reply to comment #19)
Thanks. Downgrading severity then since only the Javadoc must be updated.
Comment 23 Philipe Mulet CLA 2008-08-26 10:10:36 EDT
I still believe that we should issue a fix for the weakness, along the line of comment 20; but this is no longer critical for 3.4.1; rather for 3.5.
Comment 24 Jerome Lanneluc CLA 2008-08-26 10:26:26 EDT
Created attachment 110934 [details]
Proposed fix
Comment 25 Jerome Lanneluc CLA 2008-08-27 05:00:00 EDT
Fix released for 3.5M2
Comment 26 Olivier Thomann CLA 2008-09-15 11:25:32 EDT
No regression tests ?
Comment 27 Jerome Lanneluc CLA 2008-09-15 11:37:59 EDT
Sorry, no regression test since this is a concurrency issue. Only the code can be verified in this case.
Comment 28 Olivier Thomann CLA 2008-09-15 11:46:11 EDT
Verified for 3.5M2 by checking the source code.
Comment 29 Stefan Xenos CLA 2016-12-19 15:18:21 EST
Bug 507795 introduces a more robust fix to this issue that removes the lock entirely.