Bug 231205 - deadlock closing edited orm.xml file
Summary: deadlock closing edited orm.xml file
Status: VERIFIED FIXED
Alias: None
Product: WTP Common Tools
Classification: WebTools
Component: wst.common (show other bugs)
Version: 3.0   Edit
Hardware: PC Windows XP
: P2 critical (vote)
Target Milestone: 3.0.1   Edit
Assignee: Chuck Bridgham CLA
QA Contact: Nitin Dahyabhai CLA
URL:
Whiteboard: PMC_approved
Keywords:
Depends on: 242901
Blocks: 233763 241428 243450
  Show dependency tree
 
Reported: 2008-05-08 16:46 EDT by Karen Butzke CLA
Modified: 2010-05-13 15:36 EDT (History)
8 users (show)

See Also:
david_williams: pmc_approved+
cbridgha: pmc_approved? (raghunathan.srinivasan)
cbridgha: pmc_approved? (naci.dai)
deboer: pmc_approved+
cbridgha: pmc_approved? (neil.hauge)
cbridgha: pmc_approved? (kaloyan)


Attachments
dead lock stack traces (10.39 KB, text/plain)
2008-05-08 16:46 EDT, Karen Butzke CLA
no flags Details
possible patch (5.82 KB, patch)
2008-06-05 10:35 EDT, Chuck Bridgham CLA
no flags Details | Diff
Possible patch #2 (1.33 KB, text/plain)
2008-06-05 10:57 EDT, Chuck Bridgham CLA
no flags Details
updated deadlock stack trace (10.34 KB, patch)
2008-07-18 14:41 EDT, Karen Butzke CLA
no flags Details | Diff
patch (4.38 KB, patch)
2008-07-24 14:19 EDT, Chuck Bridgham CLA
no flags Details | Diff
my current deadlock stack trace (6.84 KB, text/plain)
2008-07-24 14:40 EDT, Paul Fullbright CLA
no flags Details
Revised patch (6.44 KB, text/plain)
2008-07-25 01:52 EDT, Chuck Bridgham CLA
no flags Details
NPE stack trace (5.08 KB, text/plain)
2008-07-25 10:21 EDT, Paul Fullbright CLA
no flags Details
New Revised patch (9.89 KB, patch)
2008-08-06 10:52 EDT, Chuck Bridgham CLA
no flags Details | Diff
Same - but removed debug println.... (9.57 KB, patch)
2008-08-07 00:02 EDT, Chuck Bridgham CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Karen Butzke CLA 2008-05-08 16:46:41 EDT
Created attachment 99376 [details]
dead lock stack traces

This deadlock was found while trying to debug bug 228750.

Create a new JPA project with all the defaults, including creating an orm.xml file.  Edit the orm.xml file and close it without saving and occassionaly you will hit this deadlock.  You can force it to happen most of the time in debug by setting these breakpoints: 
line 317 of EMF2DOMSSERenderer.initializeXMLModel(IFile, boolean) 
line 320 of ReferencedXMIResourceImpl.doUnload().  

The first has a lock on OrmResource and the second has a lock on ModelManagerImpl.ShareObject.

Attaching the stack trace
Comment 1 Nitin Dahyabhai CLA 2008-05-08 23:08:25 EDT
The lock on the ModelManager.SharedObject instance is because one thread is trying to release a model and another is trying to get the same model.  It's not really something we can change in the ModelManager.
Comment 2 Karen Butzke CLA 2008-05-09 12:33:05 EDT
Ok, then I'll set it to wst.common, since the patch Chuck created for bug 225022 is in this stack trace as well.
Comment 3 Karen Butzke CLA 2008-05-09 12:33:54 EDT
Trying that again.
Ok, then I'll set it to wst.common, since the patch Chuck created for bug
225022 is in this stack trace as well.
Comment 4 Neil Hauge CLA 2008-05-26 01:08:32 EDT
I think this bug was temporarily lost, but adopters are hitting this problem and reporting against Dali, so upgrading the severity here so this bug has some eyes on it.
Comment 5 Chuck Bridgham CLA 2008-06-05 10:35:48 EDT
Created attachment 103741 [details]
possible patch
Comment 6 Chuck Bridgham CLA 2008-06-05 10:38:17 EDT
Karen, Neil - can you see if this patch improves the deadlock situation?

I removed the java synch block on resource, and added an ILock mechanism to coordinate the addition and removal of the loading adapter. 
We didn't need to block "ANY" access to the resource for this procedure.
Comment 7 Chuck Bridgham CLA 2008-06-05 10:57:45 EDT
Created attachment 103750 [details]
Possible patch #2

Removed synch block around hashtable which is already synchronized....
Comment 8 Chuck Bridgham CLA 2008-06-05 11:23:12 EDT
Comment on attachment 103750 [details]
Possible patch #2

Obsoleting this patch as well....  running the junits - I'm still running into synch issues - need to keep looking
Comment 9 Chuck Bridgham CLA 2008-07-17 14:16:15 EDT
Hi Karen - have you seen deadlocking in recent 301 builds?

some of the related bugs have been fixed, and we haven't seen a recent issue.
Comment 10 Karen Butzke CLA 2008-07-18 14:41:36 EDT
Created attachment 107852 [details]
updated deadlock stack trace

Chuck, I'm still seeing this deadlock. I actually got it the first time I tried editing the orm.xml file and then closing it without saving.  I have attached an updated stack trace, hasn't changed much, looks like just different line numbers. Tested with the M-3.0.1-20080717094056 build
Comment 11 Paul Fullbright CLA 2008-07-23 11:57:06 EDT
I get this 100% of the time on relatively new hardware (Core 2 Duo, 2.40 GHz,
4.00 GB RAM).
Comment 12 Chuck Bridgham CLA 2008-07-24 14:19:59 EDT
Created attachment 108375 [details]
patch
Comment 13 Chuck Bridgham CLA 2008-07-24 14:23:00 EDT
Possible patch - because this is low level EMF resource loading....  need to test further before committing.

Noticed this path wasn't going through the "normal load" path that uses the load adapter - and also changed the synch block for only the adapter list - mot the resource 
Comment 14 Paul Fullbright CLA 2008-07-24 14:40:16 EDT
Created attachment 108381 [details]
my current deadlock stack trace

I tried out the patch on the maintenance branch, and still get a deadlock.  Stack trace is attached.
Comment 15 Chuck Bridgham CLA 2008-07-25 01:52:20 EDT
Created attachment 108417 [details]
Revised patch

This does a check on the sse model manager first before loading....
Comment 16 Paul Fullbright CLA 2008-07-25 10:19:04 EDT
I have verified that I cannot reproduce the deadlock now, either, and that this does seem to fix our regression bug 241428 as well.  Wonderful!

One small note:  I do get an NPE in the background.  Will attach stack trace.
Comment 17 Paul Fullbright CLA 2008-07-25 10:21:27 EDT
Created attachment 108440 [details]
NPE stack trace
Comment 18 Chuck Bridgham CLA 2008-07-30 11:13:30 EDT
Nitin - can you take a look at the exception thats being thrown after my last patch?  If I release model access during a revert, this code gets a NPE
Comment 19 Chuck Bridgham CLA 2008-08-06 10:52:38 EDT
Created attachment 109313 [details]
New Revised patch

After much testing, and going through junit buckets many times...  This is the patch we will be bringing forward to PMC for approval
Comment 20 Chuck Bridgham CLA 2008-08-06 12:57:46 EDT
    * Explain why you believe this is a stop-ship defect. Or, if it is a "hotbug" 

Not only were we hitting this deadlock in the above scenario, but also when projects were being deleted, and validation jobs etc.....  


    * Is there a work-around? If so, why do you believe the work-around is insufficient? 

Only workaround is to re-start workbench

    * How has the fix been tested? Is there a test case attached to the bugzilla record? Has a JUnit Test been added? 

The junits have been run several times by our team, and scenarios including the one above run through several times.

    * Give a brief technical overview. Who has reviewed this fix? 

Jason Sholl, Jason Paterson, and myself have reviewed the changes which includes properly obtaining our ILock instance when loading documents (fixed all load paths), and also making sure we can obtain the StructuredModelManager lock before proceeding with the load.

Oracle (Paul) has tested and verified.

    * What is the risk associated with this fix? 

Changes timing a bit on load - as would any change to ensure proper serialization, but with the testing performed, I am comfortable with this patch.

Dependent bugs:  Still would like https://bugs.eclipse.org/bugs/show_bug.cgi?id=242901 to be fixed as well.
Comment 21 David Williams CLA 2008-08-06 16:26:03 EDT
Obviously is very risky, but does sound like a true stop ship problem, so I'll trust all the reviews and testing. 

Comment 22 Tim deBoer CLA 2008-08-06 21:11:21 EDT
+1. I don't think anyone would be happy with changing synchronization code at this point in a maintenance release, but given the severity of the problem, the number of people/adopters who have hit it, and the diligence & testing by the team I will approve.
Comment 23 Chuck Bridgham CLA 2008-08-07 00:02:26 EDT
Created attachment 109372 [details]
Same - but removed debug println....
Comment 24 Carl Anderson CLA 2008-08-07 13:08:47 EDT
Committed to R3_0_maintenance for 3.0.1
Comment 25 Karen Butzke CLA 2010-05-13 15:36:50 EDT
verified fixed, haven't seen these deadlocks anymore