Community
Participate
Working Groups
Please see Bug 243547 for the problem description and history. Nitan asked that a new defect be opened for the new manifestation of this deadlock.
Analysis from Brian Vosburgh: Here's my take on bug 243547. (Which was just closed, since the SSE-related deadlock seems to have been fixed. But Gary Karasiuk will probably open a new bug with the new deadlock, whose relevant stacktraces I've extracted from the dump and attached.) Any comments, corrections, or insights are welcome. Both threads are running the Dali validator. I think(?) Worker-2 has locked the workspace while the Dali validator deletes the Dali-related markers. This causes the workspace to notify Dali that a resource has changed. The Dali listener is waiting for a lock on the Dali model (GenericJpaModel) so it can forward the notification to the appropriate JPA project. Unfortunately, GenericJpaModel is already locked by Worker-5 (below). In Worker-5, validation has triggered the (lazy) creation of a JPA project, locking the Dali model (GenericJpaModel) until the JPA project is built. This creation also triggers the loading of the DTP plug-in, which then loads and validates the DTP drivers. It looks like DTP has detected an error in one of the drivers and needs to lock the workspace so it can add the appropriate problem markers. Unfortunately, the workspace is already locked by Worker-2 (above). The way we are doing things currently (which I've never really warmed up to :-) ) doesn't really allow us to remove the validation markers asynchronously. We could investigate changing that.... Likewise, it might be difficult for DTP to add its problem markers asynchronously; but might merit investigation.... Rough notes: Worker-2 JpaValidator calls WorkbenchReporter.removeAllMessages(IValidator) calls Workspace.deleteMarkers(IMarker[]) which notifies listeners that a resource has changed waiting to call GenericJpaModel.synchronizeFiles(IProject, IResourceDelta) which is locked by Worker-5 somewhere acquires a lock... Worker-5 JpaValidator calls JptCorePlugin to get the appropriate JPA project locks and calls GenericJpaModel to get the JPA project the JPA project is (lazily) built triggers the loading of JptDbPlugin builds the DTPConnectionProfileRepository triggers the DTP InternalProfileManager to load the profiles the properties of a DTP driver are invalid so... DriverValidator tries to add a problem marker WorkManager waits to lock the "primary workspace lock" 3XMTHREADINFO "Worker-2" TID:0x3210B200, j9thread_t:0x385EC750, state:B, prio=5 3XMTHREADINFO1 (native thread ID:0x378, native priority:0x5, native policy:UNKNOWN) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:295) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:275) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:286) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:272) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.resourcePostChange(JpaModelManager.java:264) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.resourceChanged(JpaModelManager.java:237) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager$ResourceChangeListener.resourceChanged(JpaModelManager.java:487) 4XESTACKTRACE at org/eclipse/core/internal/events/NotificationManager$2.run(NotificationManager.java:288) 4XESTACKTRACE at org/eclipse/core/runtime/SafeRunner.run(SafeRunner.java:37(Compiled Code)) 4XESTACKTRACE at org/eclipse/core/internal/events/NotificationManager.notify(NotificationManager.java:282) 4XESTACKTRACE at org/eclipse/core/internal/events/NotificationManager.broadcastChanges(NotificationManager.java:148) 4XESTACKTRACE at org/eclipse/core/internal/resources/Workspace.broadcastPostChange(Workspace.java:313) 4XESTACKTRACE at org/eclipse/core/internal/resources/Workspace.endOperation(Workspace.java:1022) 4XESTACKTRACE at org/eclipse/core/internal/resources/Workspace.deleteMarkers(Workspace.java:961) 4XESTACKTRACE at org/eclipse/wst/validation/internal/TaskListUtility.removeTaskSubset(TaskListUtility.java:428) 4XESTACKTRACE at org/eclipse/wst/validation/internal/TaskListUtility.removeAllTasks(TaskListUtility.java:406) 4XESTACKTRACE at org/eclipse/wst/validation/internal/TaskListUtility.removeAllTasks(TaskListUtility.java:402) 4XESTACKTRACE at org/eclipse/wst/validation/internal/operations/WorkbenchReporter.removeAllMessages(WorkbenchReporter.java:164) 4XESTACKTRACE at org/eclipse/wst/validation/internal/operations/WorkbenchReporter.removeAllMessages(WorkbenchReporter.java:544) 4XESTACKTRACE at org/eclipse/wst/validation/internal/operations/WorkbenchReporter.removeAllMessages(WorkbenchReporter.java:509) 4XESTACKTRACE at org/eclipse/jpt/core/internal/validation/JpaValidator.validate(JpaValidator.java:46) 4XESTACKTRACE at org/eclipse/jpt/core/internal/validation/JpaValidator.validateInJob(JpaValidator.java:77) 4XESTACKTRACE at org/eclipse/wst/validation/internal/operations/ValidatorJob.run(ValidatorJob.java:78) 4XESTACKTRACE at org/eclipse/core/internal/jobs/Worker.run(Worker.java:55) 3XMTHREADINFO "Worker-5" TID:0x32119D00, j9thread_t:0x39E62C38, state:CW, prio=1 3XMTHREADINFO1 (native thread ID:0x9FC, native priority:0x1, native policy:UNKNOWN) 4XESTACKTRACE at java/lang/Object.wait(Native Method) 4XESTACKTRACE at java/lang/Object.wait(Object.java:196) 4XESTACKTRACE at org/eclipse/core/internal/jobs/Semaphore.acquire(Semaphore.java:38) 4XESTACKTRACE at org/eclipse/core/internal/jobs/OrderedLock.doAcquire(OrderedLock.java:169) 4XESTACKTRACE at org/eclipse/core/internal/jobs/OrderedLock.acquire(OrderedLock.java:105(Compiled Code)) 4XESTACKTRACE at org/eclipse/core/internal/jobs/OrderedLock.acquire(OrderedLock.java:82(Compiled Code)) 4XESTACKTRACE at org/eclipse/core/internal/resources/WorkManager.checkIn(WorkManager.java:118) 4XESTACKTRACE at org/eclipse/core/internal/resources/Workspace.prepareOperation(Workspace.java:1747) 4XESTACKTRACE at org/eclipse/core/internal/resources/Resource.createMarker(Resource.java:641) 4XESTACKTRACE at org/eclipse/datatools/connectivity/drivers/DriverValidator.addProblemMarker(DriverValidator.java:230) 4XESTACKTRACE at org/eclipse/datatools/connectivity/drivers/DriverValidator.isValid(DriverValidator.java:96) 4XESTACKTRACE at org/eclipse/datatools/connectivity/drivers/DriverValidator.isValid(DriverValidator.java:79) 4XESTACKTRACE at org/eclipse/datatools/connectivity/internal/InternalProfileManager.getDriverInstance(InternalProfileManager.java:1334) 4XESTACKTRACE at org/eclipse/datatools/connectivity/internal/InternalProfileManager.enableLocalDatabase(InternalProfileManager.java:1252) 4XESTACKTRACE at org/eclipse/datatools/connectivity/internal/InternalProfileManager.loadLocalRegisteredDatabases(InternalProfileManager.java:1211) 4XESTACKTRACE at org/eclipse/datatools/connectivity/internal/InternalProfileManager.loadProfiles(InternalProfileManager.java:928) 4XESTACKTRACE at org/eclipse/datatools/connectivity/internal/InternalProfileManager.getProfiles(InternalProfileManager.java:139) 4XESTACKTRACE at org/eclipse/datatools/connectivity/ProfileManager.getProfiles(ProfileManager.java:46) 4XESTACKTRACE at org/eclipse/jpt/db/internal/DTPConnectionProfileRepository.start(DTPConnectionProfileRepository.java:63) 4XESTACKTRACE at org/eclipse/jpt/db/JptDbPlugin.start(JptDbPlugin.java:53) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleContextImpl$2.run(BundleContextImpl.java:1009) 4XESTACKTRACE at java/security/AccessController.doPrivileged(AccessController.java:251) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleContextImpl.startActivator(BundleContextImpl.java:1003) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleContextImpl.start(BundleContextImpl.java:984) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleHost.startWorker(BundleHost.java:346) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/AbstractBundle.start(AbstractBundle.java:265) 4XESTACKTRACE at org/eclipse/osgi/framework/util/SecureAction.start(SecureAction.java:400) 4XESTACKTRACE at org/eclipse/core/runtime/internal/adaptor/EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:111(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/baseadaptor/loader/ClasspathManager.findLocalClass(ClasspathManager.java:427(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/internal/baseadaptor/DefaultClassLoader.findLocalClass(DefaultClassLoader.java:193(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleLoader.findLocalClass(BundleLoader.java:368(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/SingleSourcePackage.loadClass(SingleSourcePackage.java:33(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleLoader.findClassInternal(BundleLoader.java:441(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleLoader.findClass(BundleLoader.java:397(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/framework/internal/core/BundleLoader.findClass(BundleLoader.java:385(Compiled Code)) 4XESTACKTRACE at org/eclipse/osgi/internal/baseadaptor/DefaultClassLoader.loadClass(DefaultClassLoader.java:87(Compiled Code)) 4XESTACKTRACE at java/lang/ClassLoader.loadClass(ClassLoader.java:609(Compiled Code)) 4XESTACKTRACE at java/lang/J9VMInternals.verifyImpl(Native Method) 4XESTACKTRACE at java/lang/J9VMInternals.verify(J9VMInternals.java:72(Compiled Code)) 4XESTACKTRACE at java/lang/J9VMInternals.initialize(J9VMInternals.java:134(Compiled Code)) 4XESTACKTRACE at org/eclipse/jpt/core/internal/platform/GenericJpaFactory.buildJpaDataSource(GenericJpaFactory.java:226) 4XESTACKTRACE at org/eclipse/jpt/core/internal/GenericJpaProject.<init>(GenericJpaProject.java:141) 4XESTACKTRACE at org/eclipse/jpt/core/internal/platform/GenericJpaFactory.buildJpaProject(GenericJpaFactory.java:222) 4XESTACKTRACE at org/eclipse/jpt/core/internal/GenericJpaModel$DefaultJpaProjectHolder.buildJpaProject(GenericJpaModel.java:453) 4XESTACKTRACE at org/eclipse/jpt/core/internal/GenericJpaModel$DefaultJpaProjectHolder.jpaProject(GenericJpaModel.java:445) 4XESTACKTRACE at org/eclipse/jpt/core/internal/GenericJpaModel.getJpaProject(GenericJpaModel.java:84) 4XESTACKTRACE at org/eclipse/jpt/core/internal/JpaModelManager.getJpaProject(JpaModelManager.java:178) 4XESTACKTRACE at org/eclipse/jpt/core/JptCorePlugin.getJpaProject(JptCorePlugin.java:157) 4XESTACKTRACE at org/eclipse/jpt/core/internal/validation/JpaValidator.validationMessages(JpaValidator.java:55) 4XESTACKTRACE at org/eclipse/jpt/core/internal/validation/JpaValidator.validate(JpaValidator.java:48) 4XESTACKTRACE at org/eclipse/jpt/core/internal/validation/JpaValidator.validateInJob(JpaValidator.java:77) 4XESTACKTRACE at org/eclipse/wst/validation/internal/operations/ValidatorJob.run(ValidatorJob.java:78) 4XESTACKTRACE at org/eclipse/core/internal/jobs/Worker.run(Worker.java:55)
Created attachment 111239 [details] Filtered dump - jsp indexing Added a new dump also involving the sync locks in "GenericJpaModel" this time involving JSP indexing
Suggestion.... Remove the class synchonization around getJpaProject() if the project doesn't exist yet. A separate Job lock can prevent multiple callers of getJpaProject() if no project exists, and thus freeing up this lock for other calls like synchFiles() and javaElementChanged() "GenericJpaModel$DefaultJpaProjectHolder.buildJpaProject" could be done on a separate Job I can try creating a patch for this is you would like
Thanks for the suggestion Chuck. We are evaluating it along with some other ideas, and trying to come up with a plan that will solve these issues.
I'm not real comfortable modifying GenericJpaModel just yet.... Most of the Dali deadlocks seem to revolve around the Dali validator, which, mostly unbeknownst to us, seems to trigger very broad lock requests in other plug-ins (in the first case, the DTP DriverValidator is creating a marker; in the second, the WST TranslatorResourceImpl is loading). Maybe we should be acquiring this (these?) lock(s) up front, when the Dali validator starts executing? Does anyone know what locks Dali should acquire? I tried to re-create the deadlock with DTP, via breakpoints; but my target workspace is out of synch with whatever workspace generated the original dump. I am using the WTP 3.0.1 build and the related pre-requisites. The DTP code in the dump does not match what I have (e.g. in my workspace, InternalProfileManager.getDriverInstance(...) does not call DriverValidator.isValid()). What version of DTP are you using?
Created attachment 111326 [details] Original stacktraces (DTP) A more readable version of the original stacktraces from bug 243547.
We're using DTP 1.6.1 M1
Hi Brian, wanted to get your opinion on a "workaround" fix we've been exploring which is to force a call in our plugin to JpaModelManager.instance().getJpaModel().jpaProjects() right after importing any project. This seems to avoid the deadlock or greatly reduce the chances of it from the repeated testing I've tried. Do you see any problems with this or harmful side effects in function or performance?
(In reply to comment #8) > Hi Brian, wanted to get your opinion on a "workaround" fix we've been exploring > which is to force a call in our plugin to > JpaModelManager.instance().getJpaModel().jpaProjects() right after importing > any project. This seems to avoid the deadlock or greatly reduce the chances of > it from the repeated testing I've tried. Do you see any problems with this or > harmful side effects in function or performance? > Well...the JavaDoc comment for JpaModel#jpaProjects() does mention "performance implications".... :-) Depends on what sorts of projects you have in your workspace. It might be better to call JpaModelManager.instance().getJpaProject(IProject) if possible. Does that also help the deadlock?
From my quick read, seems the workaround is the best we can do? Adding Gary to CC. Gary, at one point there were certain rules about what validators could do and could not do about "resource changes" ... can you clarify what those rules are? I echo Brian's question in #5, Maybe we should be acquiring this (these?) lock(s) up front, when the Dali validator starts executing? Does anyone know what locks Dali should acquire? I do not know what those rules are, but I think there are some. I've always been surprised we in WTP have not had to use more scheduling rules. Hopefully Gary can clarify some what the rules are?
Gary has said, in https://bugs.eclipse.org/bugs/show_bug.cgi?id=236664#c10 "We have to think of validation as a read-only operation. Nothing (other than markers) should be changed." In thinking about this, I see two interpretations, which is right? A. Nothing can be done which might cause a resource change event, not even by starting another asynchronous job. And, that seems impossible, in practice. B. Nothing can be done which might cause a resource change event, unless it is done from another asynchronous job. That seems possible, but has a strong corollary: a validator can never call "foreign code" (code owned by others, or which isn't spec'd to be "validator safe") since in calling foreign code, you'd have no way to know if it might do something that would cause a resource change event. Thus, validators should never call foreign code unless they do so from another asynchronous job. While this later case seems feasible in practice, it basically means all validators should accomplish their work asynchronous jobs. And this leaves me confused, since, I think, one attribute of validators (already) is if they are asynchronous or not ... so why could an asynchronous validator have to have other asynchronous jobs? Seems the validator has a hard contract to fulfill? (or, impossible? :)
It is suspicious to me that Worker-5 is performing workspace changes from within a bundle activator (org/eclipse/jpt/db/JptDbPlugin.start). This is already deadlock prone because bundle activation occurs while the classloader lock is held. Also, since this can be initiated simply by having a reference to a class in that plugin, it causes potentially locking code to run at unpredictable times for clients. I would suggest pushing that bundle's long-running activation work into a background job. Another fix would be to change JpaModelManager to perform its resource change event processing asynchronously rather than directly within the resource change event callback (the synchronizeFiles method). Again because this results in insertion of locking behavior onto any client performing workspace changes, acquiring locks within a resource change callback is potentially dangerous (unless you are certain you never call out to foreign code while holding that lock, and David mentions). A common pattern here is to process the delta within the resource change event to gather the useful information, and then fork off a job to perform any locking/long-running update work (as described in http://www.eclipse.org/articles/Article-Resource-deltas/resource-deltas.html#1d).
Since there is a work-around, I've changed the severity to 'major' instead of 'critical'. Not sure if we need to fix for next service release? Or target next minor release (2.1) ... with _potential_ backport to 2.0.x maintenance stream.
(In reply to comment #13) > Not sure if we need to fix for next service release? Or target next minor > release (2.1) ... with _potential_ backport to 2.0.x maintenance stream. > The first issue that John describes in comment 12 is actually addressed in Head, as this area was refactored earlier while working on another feature. So, at this point, this particular deadlock should be resolved in recent 2.1 (WTP 3.1) builds. I think this same issue could also be addressed in maintenance with isolated code changes, but not completely sure yet. We'll post a patch and do some testing to see if there are any problems with the envisioned changes and then we can make a decision on whether we want to put the changes in next week's 3.0.2 build. I'll target to 2.1 for now, but we can retarget to 2.0.2 depending on feedback and risk assessment.
(In reply to comment #14) I continue to feel that deadlocks are quite traumatic for our customers. I feel that this needs to be fixed in the WTP 3.0.3 stream.
Created attachment 112625 [details] proposed patch This patch moves the loading of the DTP connection profiles from the plug-in start to the first request for the profiles, which occurs after the plug-in loading. Gary, can you please see if this fixes, at least, this deadlock?
Given that the WTP SR1 +2 dates are tomorrow, there probably isn't enough time to thoroughly test this change for 2.0.2. There is also a workaround described in comment 9. As a result, I am re-targeting this bug to 2.0.3. This way the fix will eventually make it into the maintenance stream (assuming it is determined to be safe), but won't risk late regression in 2.0.2. Chuck, Please enter a separate bug for the issue you described in comment 2 if it still exists.
Patch has been tested and applied in 2_0_maintenance branch for inclusion in WTP 3.0.3. Adopters - please test upcoming RC's to ensure that things have improved and not regressed in any way.
Gary, Len, are you able to verify this bug for the 3.0.3 release?
(In reply to comment #19) > Gary, Len, are you able to verify this bug for the 3.0.3 release? > It is not possible to verify these types of fixes by blackbox testing. We have to just trust that the problem has been fixed.