Bug 245573 - Deadlock with JPA Validator
Summary: Deadlock with JPA Validator
Status: CLOSED FIXED
Alias: None
Product: Dali JPA Tools
Classification: WebTools
Component: Framework (show other bugs)
Version: 2.0.1   Edit
Hardware: PC Windows XP
: P2 major (vote)
Target Milestone: 2.0.3   Edit
Assignee: Brian Vosburgh CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-28 14:24 EDT by Gary Karasiuk CLA
Modified: 2008-11-17 10:56 EST (History)
12 users (show)

See Also:


Attachments
Filtered dump - jsp indexing (13.90 KB, text/plain)
2008-08-28 14:59 EDT, Chuck Bridgham CLA
no flags Details
Original stacktraces (DTP) (9.20 KB, text/plain)
2008-08-29 14:58 EDT, Brian Vosburgh CLA
no flags Details
proposed patch (5.59 KB, patch)
2008-09-16 00:28 EDT, Brian Vosburgh CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Gary Karasiuk CLA 2008-08-28 14:24:54 EDT
Please see Bug 243547 for the problem description and history. Nitan asked that a new defect be opened for the new manifestation of this deadlock.
Comment 1 Gary Karasiuk CLA 2008-08-28 14:59:14 EDT
Analysis from Brian Vosburgh:

Here's my take on bug 243547. (Which was just closed, since the SSE-related
deadlock seems to have been fixed. But Gary Karasiuk will probably open a new bug
with the new deadlock, whose relevant stacktraces I've extracted from the dump
and attached.)

Any comments, corrections, or insights are welcome.

Both threads are running the Dali validator.

I think(?) Worker-2 has locked the workspace while the Dali validator deletes the
Dali-related markers. This causes the workspace to notify Dali that a resource
has changed. The Dali listener is waiting for a lock on the Dali model
(GenericJpaModel) so it can forward the notification to the appropriate JPA project.
Unfortunately, GenericJpaModel is already locked by Worker-5 (below).

In Worker-5, validation has triggered the (lazy) creation of a JPA project, locking
the Dali model (GenericJpaModel) until the JPA project is built. This creation also
triggers the loading of the DTP plug-in, which then loads and validates the DTP drivers.
It looks like DTP has detected an error in one of the drivers and needs to lock the
workspace so it can add the appropriate problem markers. Unfortunately, the workspace
is already locked by Worker-2 (above).

The way we are doing things currently (which I've never really warmed up to :-) )
doesn't really allow us to remove the validation markers asynchronously. We
could investigate changing that.... Likewise, it might be difficult for DTP to add its
problem markers asynchronously; but might merit investigation....


Rough notes:

Worker-2
JpaValidator
calls WorkbenchReporter.removeAllMessages(IValidator)
calls Workspace.deleteMarkers(IMarker[])
which notifies listeners that a resource has changed
waiting to call GenericJpaModel.synchronizeFiles(IProject, IResourceDelta)
    which is locked by Worker-5
somewhere acquires a lock...

Worker-5
JpaValidator
calls JptCorePlugin to get the appropriate JPA project
locks and calls GenericJpaModel to get the JPA project
the JPA project is (lazily) built
triggers the loading of JptDbPlugin
builds the DTPConnectionProfileRepository
triggers the DTP InternalProfileManager to load the profiles
the properties of a DTP driver are invalid so...
DriverValidator tries to add a problem marker
WorkManager waits to lock the "primary workspace lock"
3XMTHREADINFO      "Worker-2" TID:0x3210B200, j9thread_t:0x385EC750, state:B, prio=5
3XMTHREADINFO1            (native thread ID:0x378, native priority:0x5, native policy:UNKNOWN)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:295)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:275)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:286)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.synchronizeFiles(JpaModelManager.java:272)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.resourcePostChange(JpaModelManager.java:264)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.resourceChanged(JpaModelManager.java:237)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager$ResourceChangeListener.resourceChanged(JpaModelManager.java:487)
4XESTACKTRACE          at org/eclipse/core/internal/events/NotificationManager$2.run(NotificationManager.java:288)
4XESTACKTRACE          at org/eclipse/core/runtime/SafeRunner.run(SafeRunner.java:37(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/events/NotificationManager.notify(NotificationManager.java:282)
4XESTACKTRACE          at org/eclipse/core/internal/events/NotificationManager.broadcastChanges(NotificationManager.java:148)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.broadcastPostChange(Workspace.java:313)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.endOperation(Workspace.java:1022)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.deleteMarkers(Workspace.java:961)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/TaskListUtility.removeTaskSubset(TaskListUtility.java:428)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/TaskListUtility.removeAllTasks(TaskListUtility.java:406)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/TaskListUtility.removeAllTasks(TaskListUtility.java:402)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/operations/WorkbenchReporter.removeAllMessages(WorkbenchReporter.java:164)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/operations/WorkbenchReporter.removeAllMessages(WorkbenchReporter.java:544)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/operations/WorkbenchReporter.removeAllMessages(WorkbenchReporter.java:509)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/validation/JpaValidator.validate(JpaValidator.java:46)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/validation/JpaValidator.validateInJob(JpaValidator.java:77)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/operations/ValidatorJob.run(ValidatorJob.java:78)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/Worker.run(Worker.java:55)


3XMTHREADINFO      "Worker-5" TID:0x32119D00, j9thread_t:0x39E62C38, state:CW, prio=1
3XMTHREADINFO1            (native thread ID:0x9FC, native priority:0x1, native policy:UNKNOWN)
4XESTACKTRACE          at java/lang/Object.wait(Native Method)
4XESTACKTRACE          at java/lang/Object.wait(Object.java:196)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/Semaphore.acquire(Semaphore.java:38)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/OrderedLock.doAcquire(OrderedLock.java:169)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/OrderedLock.acquire(OrderedLock.java:105(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/jobs/OrderedLock.acquire(OrderedLock.java:82(Compiled Code))
4XESTACKTRACE          at org/eclipse/core/internal/resources/WorkManager.checkIn(WorkManager.java:118)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Workspace.prepareOperation(Workspace.java:1747)
4XESTACKTRACE          at org/eclipse/core/internal/resources/Resource.createMarker(Resource.java:641)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/drivers/DriverValidator.addProblemMarker(DriverValidator.java:230)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/drivers/DriverValidator.isValid(DriverValidator.java:96)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/drivers/DriverValidator.isValid(DriverValidator.java:79)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/internal/InternalProfileManager.getDriverInstance(InternalProfileManager.java:1334)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/internal/InternalProfileManager.enableLocalDatabase(InternalProfileManager.java:1252)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/internal/InternalProfileManager.loadLocalRegisteredDatabases(InternalProfileManager.java:1211)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/internal/InternalProfileManager.loadProfiles(InternalProfileManager.java:928)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/internal/InternalProfileManager.getProfiles(InternalProfileManager.java:139)
4XESTACKTRACE          at org/eclipse/datatools/connectivity/ProfileManager.getProfiles(ProfileManager.java:46)
4XESTACKTRACE          at org/eclipse/jpt/db/internal/DTPConnectionProfileRepository.start(DTPConnectionProfileRepository.java:63)
4XESTACKTRACE          at org/eclipse/jpt/db/JptDbPlugin.start(JptDbPlugin.java:53)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleContextImpl$2.run(BundleContextImpl.java:1009)
4XESTACKTRACE          at java/security/AccessController.doPrivileged(AccessController.java:251)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleContextImpl.startActivator(BundleContextImpl.java:1003)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleContextImpl.start(BundleContextImpl.java:984)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleHost.startWorker(BundleHost.java:346)
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/AbstractBundle.start(AbstractBundle.java:265)
4XESTACKTRACE          at org/eclipse/osgi/framework/util/SecureAction.start(SecureAction.java:400)
4XESTACKTRACE          at org/eclipse/core/runtime/internal/adaptor/EclipseLazyStarter.postFindLocalClass(EclipseLazyStarter.java:111(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/baseadaptor/loader/ClasspathManager.findLocalClass(ClasspathManager.java:427(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/internal/baseadaptor/DefaultClassLoader.findLocalClass(DefaultClassLoader.java:193(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleLoader.findLocalClass(BundleLoader.java:368(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/SingleSourcePackage.loadClass(SingleSourcePackage.java:33(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleLoader.findClassInternal(BundleLoader.java:441(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleLoader.findClass(BundleLoader.java:397(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/framework/internal/core/BundleLoader.findClass(BundleLoader.java:385(Compiled Code))
4XESTACKTRACE          at org/eclipse/osgi/internal/baseadaptor/DefaultClassLoader.loadClass(DefaultClassLoader.java:87(Compiled Code))
4XESTACKTRACE          at java/lang/ClassLoader.loadClass(ClassLoader.java:609(Compiled Code))
4XESTACKTRACE          at java/lang/J9VMInternals.verifyImpl(Native Method)
4XESTACKTRACE          at java/lang/J9VMInternals.verify(J9VMInternals.java:72(Compiled Code))
4XESTACKTRACE          at java/lang/J9VMInternals.initialize(J9VMInternals.java:134(Compiled Code))
4XESTACKTRACE          at org/eclipse/jpt/core/internal/platform/GenericJpaFactory.buildJpaDataSource(GenericJpaFactory.java:226)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/GenericJpaProject.<init>(GenericJpaProject.java:141)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/platform/GenericJpaFactory.buildJpaProject(GenericJpaFactory.java:222)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/GenericJpaModel$DefaultJpaProjectHolder.buildJpaProject(GenericJpaModel.java:453)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/GenericJpaModel$DefaultJpaProjectHolder.jpaProject(GenericJpaModel.java:445)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/GenericJpaModel.getJpaProject(GenericJpaModel.java:84)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/JpaModelManager.getJpaProject(JpaModelManager.java:178)
4XESTACKTRACE          at org/eclipse/jpt/core/JptCorePlugin.getJpaProject(JptCorePlugin.java:157)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/validation/JpaValidator.validationMessages(JpaValidator.java:55)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/validation/JpaValidator.validate(JpaValidator.java:48)
4XESTACKTRACE          at org/eclipse/jpt/core/internal/validation/JpaValidator.validateInJob(JpaValidator.java:77)
4XESTACKTRACE          at org/eclipse/wst/validation/internal/operations/ValidatorJob.run(ValidatorJob.java:78)
4XESTACKTRACE          at org/eclipse/core/internal/jobs/Worker.run(Worker.java:55)
Comment 2 Chuck Bridgham CLA 2008-08-28 14:59:25 EDT
Created attachment 111239 [details]
Filtered dump - jsp indexing

Added a new dump also involving the sync locks in "GenericJpaModel" this time involving JSP indexing
Comment 3 Chuck Bridgham CLA 2008-08-28 15:47:59 EDT
Suggestion....

Remove the class synchonization around getJpaProject() if the project doesn't exist yet.

A separate Job lock can prevent multiple callers of getJpaProject() if no project exists, and thus freeing up this lock for other calls like synchFiles() and javaElementChanged()

"GenericJpaModel$DefaultJpaProjectHolder.buildJpaProject" could be done on a separate Job


I can try creating a patch for this is you would like
Comment 4 Neil Hauge CLA 2008-08-29 12:46:51 EDT
Thanks for the suggestion Chuck.  We are evaluating it along with some other ideas, and trying to come up with a plan that will solve these issues.
Comment 5 Brian Vosburgh CLA 2008-08-29 14:32:35 EDT
I'm not real comfortable modifying GenericJpaModel just yet....

Most of the Dali deadlocks seem to revolve around the Dali validator, which, mostly unbeknownst to us, seems to trigger very broad lock requests in other plug-ins (in the first case, the DTP DriverValidator is creating a marker; in the second, the WST TranslatorResourceImpl is loading). Maybe we should be acquiring this (these?) lock(s) up front, when the Dali validator starts executing? Does anyone know what locks Dali should acquire?

I tried to re-create the deadlock with DTP, via breakpoints; but my target workspace is out of synch with whatever workspace generated the original dump. I am using the WTP 3.0.1 build and the related pre-requisites. The DTP code in the dump does not match what I have (e.g. in my workspace, InternalProfileManager.getDriverInstance(...) does not call DriverValidator.isValid()). What version of DTP are you using?
Comment 6 Brian Vosburgh CLA 2008-08-29 14:58:56 EDT
Created attachment 111326 [details]
Original stacktraces (DTP)

A more readable version of the original stacktraces from bug 243547.
Comment 7 Leonard Theivendra CLA 2008-08-29 18:11:12 EDT
We're using DTP 1.6.1 M1
Comment 8 Leonard Theivendra CLA 2008-09-02 10:38:50 EDT
Hi Brian, wanted to get your opinion on a "workaround" fix we've been exploring which is to force a call in our plugin to JpaModelManager.instance().getJpaModel().jpaProjects() right after importing any project. This seems to avoid the deadlock or greatly reduce the chances of it from the repeated testing I've tried.  Do you see any problems with this or harmful side effects in function or performance? 
Comment 9 Brian Vosburgh CLA 2008-09-02 11:05:17 EDT
(In reply to comment #8)
> Hi Brian, wanted to get your opinion on a "workaround" fix we've been exploring
> which is to force a call in our plugin to
> JpaModelManager.instance().getJpaModel().jpaProjects() right after importing
> any project. This seems to avoid the deadlock or greatly reduce the chances of
> it from the repeated testing I've tried.  Do you see any problems with this or
> harmful side effects in function or performance? 
> 

Well...the JavaDoc comment for JpaModel#jpaProjects() does mention "performance implications".... :-) Depends on what sorts of projects you have in your workspace. It might be better to call JpaModelManager.instance().getJpaProject(IProject) if possible. Does that also help the deadlock?
Comment 10 David Williams CLA 2008-09-03 21:02:36 EDT
From my quick read, seems the workaround is the best we can do? 

Adding Gary to CC. Gary, at one point there were certain rules about what validators could do and could not do about "resource changes" ... can you clarify what those rules are? 

I echo Brian's question in #5, 
Maybe we should be acquiring this (these?) lock(s) up front, when the Dali validator starts executing? Does anyone know what locks Dali should acquire?

I do not know what those rules are, but I think there are some. 

I've always been surprised we in WTP have not had to use more scheduling rules. 

Hopefully Gary can clarify some what the rules are? 


Comment 11 David Williams CLA 2008-09-08 21:16:18 EDT
Gary has said, in

https://bugs.eclipse.org/bugs/show_bug.cgi?id=236664#c10

"We have to think of validation as a read-only operation. Nothing (other than
markers) should be changed."


In thinking about this, I see two interpretations, which is right? 

A. Nothing can be done which might cause a resource change event, not even by starting another asynchronous job. 

And, that seems impossible, in practice. 

B. Nothing can be done which might cause a resource change event, unless it is done from another asynchronous job. That seems possible, but has a strong corollary: a validator can never call "foreign code" (code owned by others, or which isn't spec'd to be "validator safe") since in calling foreign code, you'd have no way to know if it might do something that would cause a resource change event. Thus, validators should never call foreign code unless they do so from another asynchronous job. 

While this later case seems feasible in practice, it basically means all validators should accomplish their work asynchronous jobs. And this leaves me confused, since, I think, one attribute of validators (already) is if they are asynchronous or not ... so why could an asynchronous validator have to have other asynchronous jobs? 

Seems the validator has a hard contract to fulfill? (or, impossible? :) 


 
Comment 12 John Arthorne CLA 2008-09-09 11:26:19 EDT
It is suspicious to me that Worker-5 is performing workspace changes from within a bundle activator (org/eclipse/jpt/db/JptDbPlugin.start). This is already deadlock prone because bundle activation occurs while the classloader lock is held. Also, since this can be initiated simply by having a reference to a class in that plugin, it causes potentially locking code to run at unpredictable times for clients. I would suggest pushing that bundle's long-running activation work into a background job.

Another fix would be to change JpaModelManager to perform its resource change event processing asynchronously rather than directly within the resource change event callback (the synchronizeFiles method). Again because this results in insertion of locking behavior onto any client performing workspace changes, acquiring locks within a resource change callback is potentially dangerous (unless you are certain you never call out to foreign code while holding that lock, and David mentions). A common pattern here is to process the delta within the resource change event to gather the useful information, and then fork off a job to perform any locking/long-running update work (as described in http://www.eclipse.org/articles/Article-Resource-deltas/resource-deltas.html#1d).
Comment 13 David Williams CLA 2008-09-12 10:41:24 EDT

Since there is a work-around, I've changed the severity to 'major' instead of 'critical'. 

Not sure if we need to fix for next service release? Or target next minor release (2.1) ... with _potential_ backport to 2.0.x maintenance stream. 

Comment 14 Neil Hauge CLA 2008-09-12 16:09:51 EDT
(In reply to comment #13)

> Not sure if we need to fix for next service release? Or target next minor
> release (2.1) ... with _potential_ backport to 2.0.x maintenance stream. 
> 

The first issue that John describes in comment 12 is actually addressed in Head, as this area was refactored earlier while working on another feature.  So, at this point, this particular deadlock should be resolved in recent 2.1 (WTP 3.1) builds.

I think this same issue could also be addressed in maintenance with isolated code changes, but not completely sure yet.  We'll post a patch and do some testing to see if there are any problems with the envisioned changes and then we can make a decision on whether we want to put the changes in next week's 3.0.2 build.

I'll target to 2.1 for now, but we can retarget to 2.0.2 depending on feedback and risk assessment.
Comment 15 Gary Karasiuk CLA 2008-09-15 10:59:37 EDT
(In reply to comment #14)
I continue to feel that deadlocks are quite traumatic for our customers. I feel that this needs to be fixed in the WTP 3.0.3 stream. 

Comment 16 Brian Vosburgh CLA 2008-09-16 00:28:12 EDT
Created attachment 112625 [details]
proposed patch

This patch moves the loading of the DTP connection profiles from the plug-in start to the first request for the profiles, which occurs after the plug-in loading. Gary, can you please see if this fixes, at least, this deadlock?
Comment 17 Neil Hauge CLA 2008-09-16 14:53:15 EDT
Given that the WTP SR1 +2 dates are tomorrow, there probably isn't enough time to thoroughly test this change for 2.0.2.  There is also a workaround described in comment 9.  As a result, I am re-targeting this bug to 2.0.3.  This way the fix will eventually make it into the maintenance stream (assuming it is determined to be safe), but won't risk late regression in 2.0.2.


Chuck,

Please enter a separate bug for the issue you described in comment 2 if it still exists.



Comment 18 Neil Hauge CLA 2008-10-14 09:19:52 EDT
Patch has been tested and applied in 2_0_maintenance branch for inclusion in WTP 3.0.3.  

Adopters - please test upcoming RC's to ensure that things have improved and not regressed in any way.
Comment 19 Neil Hauge CLA 2008-11-13 16:30:38 EST
Gary, Len, are you able to verify this bug for the 3.0.3 release?
Comment 20 Gary Karasiuk CLA 2008-11-17 10:56:13 EST
(In reply to comment #19)
> Gary, Len, are you able to verify this bug for the 3.0.3 release?
> 
It is not possible to verify these types of fixes by blackbox testing. We have to just trust that the problem has been fixed.