Bug 511793 - Deadlock DependencyGraphImpl and ComponentCore
Summary: Deadlock DependencyGraphImpl and ComponentCore
Status: RESOLVED FIXED
Alias: None
Product: WTP Common Tools
Classification: WebTools
Component: wst.common (show other bugs)
Version: unspecified   Edit
Hardware: PC Mac OS X
: P1 major with 3 votes (vote)
Target Milestone: 3.9   Edit
Assignee: Carl Anderson CLA
QA Contact: Carl Anderson CLA
URL:
Whiteboard: RHT
Keywords:
Depends on: 508685
Blocks: 515270
  Show dependency tree
 
Reported: 2017-02-06 14:51 EST by Brian de Alwis CLA
Modified: 2017-05-04 10:16 EDT (History)
5 users (show)

See Also:


Attachments
Deadlock even with proposed workaround by Brian (31.23 KB, text/plain)
2017-04-06 10:03 EDT, Aurelien Pupier CLA
no flags Details
Proposed fix (2.42 KB, patch)
2017-04-07 09:09 EDT, Aurelien Pupier CLA
apupier: review+
ccc: review-
Details | Diff
Use the same locking scheme as 508685 here, too (1.85 KB, patch)
2017-04-11 16:09 EDT, Carl Anderson CLA
no flags Details | Diff
Use the same locking scheme as 508685 here, too (try 2) (1.88 KB, patch)
2017-04-11 16:18 EDT, Carl Anderson CLA
no flags Details | Diff
Patch with a single synchronized method and single lock which is taken before the syncrhonization (2.42 KB, patch)
2017-04-12 11:53 EDT, Aurelien Pupier CLA
apupier: review+
Details | Diff
Patch with a single synchronized method (with explicit name) and single lock which is taken before the synchronization and without extra new lines (2.16 KB, patch)
2017-04-13 08:52 EDT, Aurelien Pupier CLA
ccc: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brian de Alwis CLA 2017-02-06 14:51:03 EST
We have some code that creates a project and then installs a custom facet.  We're seeing a deadlock with the DependencyGraphImpl's GraphUpdateJob.

Our code creates the project and then uses FacetedProject#modify() to install our own facet in the project.  It hangs attempting to acquire the project's lock.  But it has already acquired the ComponentImplManager's synchronization lock (stack frame marked with XXX):

Thread [ModalContext] (Suspended (entry into method getProjectLockObject in EMFWorkbenchEditContextFactory))	
	owns: ComponentImplManager  (id=218)	
	owns: CreateAppEngineStandardWtpProject  (id=219)	
	EMFWorkbenchEditContextFactory.getProjectLockObject(IProject) line: 63	
	EMFWorkbenchEditContextFactory.createEMFContext(IProject, IEMFContextContributor) line: 79	
	WorkbenchResourceHelperBase.createEMFContext(IProject, IEMFContextContributor) line: 249	
	ModuleCoreNature(EMFNature).createEmfContext() line: 105	
	ModuleCoreNature(EMFNature).getEmfContextBase() line: 229	
	ModuleCoreNature(EditModelNature).getEmfContext() line: 61	
	ModuleCoreNature.getModuleStructuralModelForWrite(Object) line: 303	
	StructureEdit.<init>(ModuleCoreNature, boolean) line: 329	
	StructureEdit.getStructureEditForRead(IProject) line: 119	
	VirtualComponent.createResource() line: 121	
	VirtualComponent.initializeResource() line: 113	
	VirtualComponent.<init>(IProject, IPath) line: 148	
XXX	ComponentImplManager.createComponent(IProject, boolean) line: 250	
	ComponentCore.createComponent(IProject, boolean) line: 84	
	WebFacetInstallDelegate.setContentPropertyIfNeeded(IDataModel, IPath, IProject) line: 221	
	WebFacetInstallDelegate.execute(IProject, IProjectFacetVersion, Object, IProgressMonitor) line: 86	
	FacetedProject.callDelegate(IProjectFacetVersion, IDelegate, Object, Object, IProgressMonitor) line: 1477	
	FacetedProject.modifyInternal(Set<Action>, IProgressMonitor) line: 441	
	FacetedProject.mergeChangesInternal(IFacetedProjectWorkingCopy, IProgressMonitor) line: 1181	
	FacetedProject.access$2(FacetedProject, IFacetedProjectWorkingCopy, IProgressMonitor) line: 1117	
	FacetedProject$1.run(IProgressMonitor) line: 324	
	Workspace.run(ICoreRunnable, ISchedulingRule, int, IProgressMonitor) line: 2240	
	Workspace.run(IWorkspaceRunnable, ISchedulingRule, int, IProgressMonitor) line: 2267	
	FacetedProject.modify(Set<Action>, IProgressMonitor) line: 339	
	AppEngineStandardFacet.installAppEngineFacet(IFacetedProject, boolean, IProgressMonitor) line: 150	
	CreateAppEngineStandardWtpProject.execute(IProgressMonitor) line: 103	
	CreateAppEngineStandardWtpProject(WorkspaceModifyOperation).lambda$0(InvocationTargetException[], IProgressMonitor) line: 107	
	1612372028.run(IProgressMonitor) line: not available	
	Workspace.run(ICoreRunnable, ISchedulingRule, int, IProgressMonitor) line: 2240	
	Workspace.run(IWorkspaceRunnable, ISchedulingRule, int, IProgressMonitor) line: 2267	
	CreateAppEngineStandardWtpProject(WorkspaceModifyOperation).run(IProgressMonitor) line: 128	
	ModalContext$ModalContextThread.run() line: 119	


The project creation spawns a GraphUpdateJob that call ComponentImplManager#createComponent(IProject).  This method first acquires the associated project lock, and then attempts to call ComponentImplManager#createComponent(IProject,boolean), which is a synchronized method.

Thread [Worker-0] (Suspended (entry into method getProjectLockObject in EMFWorkbenchEditContextFactory))	
	EMFWorkbenchEditContextFactory.getProjectLockObject(IProject) line: 63	
	ComponentImplManager.createComponent(IProject) line: 207	
	ComponentCore.createComponent(IProject) line: 64	
	DependencyGraphImpl$GraphUpdateJob$1.run() line: 494	
	SafeRunner.run(ISafeRunnable) line: 42	
	DependencyGraphImpl$GraphUpdateJob.run(IProgressMonitor) line: 462	
	Worker.run() line: 56	

If the GraphUpdateJob acquires the project lock first, we have deadlock as our ModalContext thread has the ComponentImplManager's synchronization lock.
Comment 1 Brian de Alwis CLA 2017-02-06 15:14:16 EST
The DependencyGraphImpl job is triggered as we reference the org.eclipse.jst.j2ee.web.project.facet.WebFacetInstallDataModelProvider from our code, activating the org.eclipse.jst.j2ee bundle.  The last action of J2EEPlugin is to register a graph listener on the DependencyGraphImpl, which triggers an update.

Thread [ModalContext] (Suspended (breakpoint at line 665 in DependencyGraphImpl))	
	owns: Object  (id=131)	
	owns: Object  (id=132)	
	owns: CreateAppEngineStandardWtpProject  (id=133)	
	DependencyGraphImpl.postUpdate() line: 665	
	DependencyGraphImpl.initGraph() line: 242	
	DependencyGraphImpl.getInstance() line: 92	
	IDependencyGraph.<clinit>() line: 70	
	J2EEPlugin.start(BundleContext) line: 539	
	BundleContextImpl$3.run() line: 774	
	BundleContextImpl$3.run() line: 1	
	AccessController.doPrivileged(PrivilegedExceptionAction<T>) line: not available [native method]	
	BundleContextImpl.startActivator(BundleActivator) line: 767	
	BundleContextImpl.start() line: 724	
	EquinoxBundle.startWorker0() line: 933	
	EquinoxBundle$EquinoxModule.startWorker() line: 309	
	EquinoxBundle$EquinoxModule(Module).doStart(Module$StartOptions...) line: 581	
	EquinoxBundle$EquinoxModule(Module).start(Module$StartOptions...) line: 449	
	SecureAction.start(Module, Module$StartOptions...) line: 468	
	EclipseLazyStarter.postFindLocalClass(String, Class<?>, ClasspathManager) line: 103	
	ClasspathManager.findLocalClass(String) line: 529	
	EquinoxClassLoader(ModuleClassLoader).findLocalClass(String) line: 328	
	BundleLoader.findLocalClass(String) line: 359	
	SingleSourcePackage.loadClass(String) line: 36	
	BundleLoader.findClassInternal(String, boolean) line: 433	
	BundleLoader.findClass(String, boolean) line: 386	
	BundleLoader.findClass(String) line: 378	
	EquinoxClassLoader(ModuleClassLoader).loadClass(String, boolean) line: 150	
	EquinoxClassLoader(ClassLoader).loadClass(String) line: 357	
	AppEngineStandardFacet.addWebFacetToBatch(IFacetedProject, Set<Action>) line: 248	
	AppEngineStandardFacet.installAppEngineFacet(IFacetedProject, boolean, IProgressMonitor) line: 139	
	CreateAppEngineStandardWtpProject.execute(IProgressMonitor) line: 103	
	CreateAppEngineStandardWtpProject(WorkspaceModifyOperation).lambda$0(InvocationTargetException[], IProgressMonitor) line: 107	
	990936519.run(IProgressMonitor) line: not available	
	Workspace.run(ICoreRunnable, ISchedulingRule, int, IProgressMonitor) line: 2240	
	Workspace.run(IWorkspaceRunnable, ISchedulingRule, int, IProgressMonitor) line: 2267	
	CreateAppEngineStandardWtpProject(WorkspaceModifyOperation).run(IProgressMonitor) line: 128	
	ModalContext$ModalContextThread.run() line: 119
Comment 2 Carl Anderson CLA 2017-02-06 15:52:39 EST
Bug 508685 fixed a deadlock between DependencyGraph and StructureEdit, but caused this deadlock to be able to occur.
Comment 3 Brian de Alwis CLA 2017-02-09 11:01:49 EST
A workaround is to force the Dependency Graph manager to defer updates when making project facet modifications:

   try {
     IDependencyGraph.INSTANCE.preUpdate();
     facetedProject.modify(facetInstallSet, subMonitor.newChild(100));
   } finally {
     IDependencyGraph.INSTANCE.postUpdate();
   }
Comment 4 Brian de Alwis CLA 2017-02-10 09:22:18 EST
And the problem with dreadlocks is that they have a habit of coming back.

   try {
     // pause the updates
     IDependencyGraph.INSTANCE.preUpdate();
     // wait for any currently-executing update jobs
     Jobs.getJobManager().join(DependencyGraphImpl.GRAPH_UPDATE_JOB_FAMILY, monitor);

     // now do your thing
     facetedProject.modify(facetInstallSet, subMonitor.newChild(100));
   } finally {
     // resume the updates
     IDependencyGraph.INSTANCE.postUpdate();
   }
Comment 5 Aurelien Pupier CLA 2017-04-06 10:03:21 EDT
Created attachment 267682 [details]
Deadlock even with proposed workaround by Brian
Comment 6 Aurelien Pupier CLA 2017-04-06 10:06:00 EDT
We are also impacted by the regression introduced in Neon.3 by https://bugs.eclipse.org/bugs/show_bug.cgi?id=508685 

I tried to workaround proposed by Brian in https://bugs.eclipse.org/bugs/show_bug.cgi?id=511793#c4 but I still have the deadlock.

The deadlock occurs for me something like 1 out of 5 times.

I attached my thread stack https://bugs.eclipse.org/bugs/attachment.cgi?id=267682

The stack looks very similar to what is initially reported.
Comment 7 Fred Bricon CLA 2017-04-06 14:15:17 EDT
Same problem affects m2e-wtp integration tests
Comment 8 Fred Bricon CLA 2017-04-06 17:34:33 EDT
I must add that some m2e-wtp tests fail to run 100% of the time due to this deadlock
Comment 9 Aurelien Pupier CLA 2017-04-07 09:09:35 EDT
Created attachment 267697 [details]
Proposed fix

I'm proposing a patch, it is working for me locally (I mean I launched 50 times my test and no deadlock, previously it was deadlocking 1 out 5 times)

Please test it.

if fine, i think that we should release a new version as soon as possible that we can use with Neon
Comment 10 Nick Boldt CLA 2017-04-07 10:45:40 EDT
Downstream issues: 
* Fedora/RHSCL: https://bugzilla.redhat.com/show_bug.cgi?id=1440205
* JBoss Tools/Devstudio: https://issues.jboss.org/browse/JBIDE-24240
Comment 11 Fred Bricon CLA 2017-04-07 13:36:59 EDT
With Aurelien's patch, I don't see any deadlocks running m2e-wtp tests in Eclipse. For some reason, running from tycho doesn't pick up my local org.eclipse.wst.common.core build, so I can't check there for now.

@Brian, can you give it a try too?
Comment 12 Mat Booth CLA 2017-04-10 06:49:19 EDT
(In reply to Fred Bricon from comment #8)
> I must add that some m2e-wtp tests fail to run 100% of the time due to this
> deadlock

Hi, can you expand on which tests fail so I can determine if my installation suffers from the problem?
Comment 13 Fred Bricon CLA 2017-04-10 10:05:54 EDT
@Matt in https://github.com/jbosstools/m2e-wtp-tests/blob/master/org.eclipse.m2e.wtp.tests/src/org/eclipse/m2e/wtp/tests/WTPProjectConfiguratorTest.java, following tests got stuck:

- testMNGECLIPSE2357_customWebXml()
- testMNGECLIPSE984_errorMarkers()
- testMNGECLIPSE2279_finalNameAsContextRoot()
- test458196_missingWebXmlMarker2()
Comment 14 Rob Stryker CLA 2017-04-11 13:44:02 EDT
Hotbug request:

1. Affiliation: Red Hat / Eclipse m2e-WTP / Fuse Tooling
2. Which release: in a 3.8.x stream and if possible integrated in a Neon.3 patch release.
3. Reasons of high priority:
- This bug is a regression compared to Neon.2 which is affecting several products including the Eclipse projects m2e-wtp.
- There is no workaround working for every cases.
- When bug encountered the application is frozen


Please note that 2 mails has been sent to wtp-pmc mailing-list: https://dev.eclipse.org/mhonarc/lists/wtp-pmc/msg02350.html
Comment 15 Carl Anderson CLA 2017-04-11 15:46:18 EDT
Comment on attachment 267697 [details]
Proposed fix

The proposed patch reintroduces the deadlock encountered in bug 508685
Comment 16 Carl Anderson CLA 2017-04-11 16:09:45 EDT
Created attachment 267761 [details]
Use the same locking scheme as 508685 here, too

Initial patch, before testing.
Comment 17 Carl Anderson CLA 2017-04-11 16:18:45 EDT
Created attachment 267762 [details]
Use the same locking scheme as 508685 here, too (try 2)
Comment 18 Aurelien Pupier CLA 2017-04-12 02:59:57 EDT
(In reply to Carl Anderson from comment #17)
> Created attachment 267762 [details]
> Use the same locking scheme as 508685 here, too (try 2)

The new provided patch is working also.
I don't understand why my proposed patch was reintroducing the old deadlock as I kept the old lock and just add it to org.eclipse.wst.common.componentcore.internal.util.ComponentImplManager.createComponent(IProject, boolean) too.
It was in a single place just because I factorized the code.

I think that the new patch is less cleaner as it is spreading the lock in 2 different classes but as it is working and we need the fix asap, I think that we can use it and discuss to improve maintainability later on.
Comment 19 Brian de Alwis CLA 2017-04-12 10:23:34 EDT
Mixing synchronized and explicit locks is a sure source of synchronization bugs.  There are two locks here: ComponentImplMgr's synchronization lock (global), and the per-project EMFWorkbenchEditContesxtFactory lock.  It's important that we always acquire the project lock before acquiring the synchronization lock to prevent deadlocks.  Aurelien's patch may seem to work but it reverses this acquisition order such that it will be possible for a thread to block trying to acquire the project lock while holding the synchronization lock.

Carl's patch follows the project-lock-then-global-lock ordering.  But it's only safe as  it patches the *known* callers to ComponentImplManager#createComponent(IProject,boolean):

  * ComponentImplManager's createComponent(IProject) (already patches via bug 508685)
  * ComponentCore#createComponent(IProject,boolean)

Any other call paths into ComponentImplManager#createComponent(IProject,boolean), which  is public although marked as internal, could cause these same deadlocks.

Aurlien's and Carl's patches could be merged into a tighter form that isolates the locking to a single class (like Aurelien's) but preserves the project-lock-then-global-lock ordering by moving the synchronized keyword off the ComponentImplManager#createComponent(IProject,boolean) method and using a synchronized block instead:

	public IVirtualComponent createComponent(IProject project) {
		return createComponent(project, true);
	}

        // note: no longer synchronized
	public IVirtualComponent createComponent(IProject project, boolean checkSettings) {
		ILock lock = EMFWorkbenchEditContextFactory.getProjectLockObject(project);
		try{
			if(null != lock){
				lock.acquire();
			}
                        // now acquire synchronization lock
			synchronized (this) {
			  return createComponentWithoutLock(project, checkSettings);
			}
		} finally{
			if(null != lock){
				lock.release();
			}
		}
	}

I've run my test suite a number of times but it's not a very good judge as it only triggered this deadlock on rare occasions.
Comment 20 Aurelien Pupier CLA 2017-04-12 11:53:22 EDT
Created attachment 267776 [details]
Patch with a single synchronized method and single lock which is taken before the syncrhonization

Thanks Brian for the explanation and the proposal!
I better understand what was my mistake.

So here is another patch based on your comment, I'm using a synchronized on the private method instead of using the synchronized(this) {} block. I expect it to be equivalent and more readable. Let me know if I'm wrong.

My tests are passing.

@Carl
- how can we reproduce the previous deadlock?https://bugs.eclipse.org/bugs/show_bug.cgi?id=508685
- what do you think about this solution?
Comment 21 Rob Stryker CLA 2017-04-12 18:37:31 EDT
I think it's time we finalize on one of these patches and move forward.

 @Brian can you review Aurelien's recent patch, and if it's no good, suggest either a better solution or which of the available patches should be chosen? 

Time isn't really ample enough to risk delaying this.
Comment 22 Brian de Alwis CLA 2017-04-13 08:39:07 EDT
Aurelien's patch looks equivalent to me.  Two nits:
  - extra blank lines
  - maybe createComponentWithoutLock() should be renamed since it now has a lock!
Comment 23 Aurelien Pupier CLA 2017-04-13 08:52:33 EDT
Created attachment 267792 [details]
Patch with a single synchronized method (with explicit name) and single lock which is taken before the synchronization and without extra new lines

same patch with name changed from createComponentWithoutLock to createComponentSynchronously and extra lines removed
Comment 24 Carl Anderson CLA 2017-04-13 13:38:07 EDT
(In reply to Rob Stryker from comment #14)
> Hotbug request:
> 
> 1. Affiliation: Red Hat / Eclipse m2e-WTP / Fuse Tooling
> 2. Which release: in a 3.8.x stream and if possible integrated in a Neon.3
> patch release.
> 3. Reasons of high priority:
> - This bug is a regression compared to Neon.2 which is affecting several
> products including the Eclipse projects m2e-wtp.
> - There is no workaround working for every cases.
> - When bug encountered the application is frozen
> 
> 
> Please note that 2 mails has been sent to wtp-pmc mailing-list:
> https://dev.eclipse.org/mhonarc/lists/wtp-pmc/msg02350.html

Rob, as part of the WTP Hotbug policy documented at 
https://wiki.eclipse.org/WTP/Hotbug_Policy , a hotbug_request is made if there are no committers from the adopter in the impacted project.  Simply put, you are still a committer on WTP Common, and as such should follow the appropriate procedure (increasing the priority of the bug) instead.

Please note that this bug is getting a lot of attention and will have a fix in place before Oxygen M7 is declared.  As to patching older releases, other bug(s) will need to be opened.
Comment 25 Nick Boldt CLA 2017-04-13 14:27:07 EDT
As requested I've opened a backport bug, with the hope we can get this fixed as a M or P release of 3.8.3.

See https://bugs.eclipse.org/bugs/show_bug.cgi?id=515270
Comment 26 Aurelien Pupier CLA 2017-04-27 06:54:34 EDT
@Carl (or any other wtp committer) Can you include the fix in master to ensure to have it available for Oxygen M7 please?
Comment 27 Nick Boldt CLA 2017-04-28 09:59:29 EDT
I'm working on a feature patch for this (using the webtools.common.all fork). 

See https://issues.jboss.org/browse/JBIDE-24376
Comment 28 Carl Anderson CLA 2017-05-04 10:12:40 EDT
Comment on attachment 267792 [details]
Patch with a single synchronized method (with explicit name) and single lock which is taken before the synchronization and without extra new lines

A slightly modified version of Aurelien's patch was committed (I restored a null check that was removed).
Comment 29 Carl Anderson CLA 2017-05-04 10:14:15 EDT
This was committed to master for WTP 3.9.0 M7
Comment 30 Aurelien Pupier CLA 2017-05-04 10:16:02 EDT
(In reply to Carl Anderson from comment #28)
> Comment on attachment 267792 [details]
> A slightly modified version of Aurelien's patch was committed

Thank you Carl!

> (I restored a null check that was removed).

If you wish, SonarLint was reporting it as a useless null check. I won't fight for that ^^