146968 – Frequent deadlocks when multiple JSP validators are running

Bug 146968 - Frequent deadlocks when multiple JSP validators are running

Summary: Frequent deadlocks when multiple JSP validators are running

Status:	CLOSED FIXED

Alias:	None

Product:	WTP Source Editing
Classification:	WebTools
Component:	jst.jsp (show other bugs)
Version:	1.5
Hardware:	PC Windows XP

Importance:	P1 major (vote)
Target Milestone:	1.5 RC5
Assignee:	Nitin Dahyabhai
QA Contact:	David Williams

URL:
Whiteboard:	PMC_approved
Keywords:

Depends on:
Blocks:

Reported:	2006-06-13 22:56 EDT by Nitin Dahyabhai
Modified:	2006-06-29 14:27 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
partial stack traces #1 (16.87 KB, text/plain) 2006-06-13 22:58 EDT, Nitin Dahyabhai	no flags	Details
partial stack traces #2 (15.73 KB, text/plain) 2006-06-13 22:58 EDT, Nitin Dahyabhai	no flags	Details
proposed patch (4.14 KB, patch) 2006-06-13 23:01 EDT, Nitin Dahyabhai	no flags	Details \| Diff
simplified patch without the ancillary cleanups (1.86 KB, patch) 2006-06-14 02:47 EDT, Nitin Dahyabhai	no flags	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nitin Dahyabhai

2006-06-13 22:56:51 EDT

Deadlocks have been appearing intermittently when validation is being performed with lots of JSP models being opened and closed.  The common factor appears to be within the JSP TaglibController class where one synchronized block of code on one field makes a call to another method in the same class that also contains a synchronized block of code for another field.

Comment 1 Nitin Dahyabhai

2006-06-13 22:58:43 EDT

Created attachment 44360 [details]
partial stack traces #1

Comment 2 Nitin Dahyabhai

2006-06-13 22:58:58 EDT

Created attachment 44361 [details]
partial stack traces #2

Comment 3 Nitin Dahyabhai

2006-06-13 23:01:38 EDT

Created attachment 44363 [details]
proposed patch

Attaching a patch that changes the synchronized blocks so that only the field being synchronized on is manipulated.  The synchronization was only introduced to make the underlying fields thread-safe, so there's really too much locking going on without the patch.

Comment 4 Nitin Dahyabhai

2006-06-14 02:47:06 EDT

Created attachment 44380 [details]
simplified patch without the ancillary cleanups

Updating the patch so it's easier to evaluate.

This problem is actually pretty hard to visualize, but in the first attached stack trace, all of the important behavior happens in worker threads 2 and 8 in the following methods (ordered chronologically):

Worker thread 2 : org.eclipse.jst.jsp.core.taglib.TaglibIndex$ResourceChangeListener.resourceChanged(org.eclipse.core.resources.IResourceChangeEvent) line: 240
	-> TaglibIndex acquires its generic ILock
	-> begins processing deltas, which takes a while

Worker thread 8 : org.eclipse.jst.jsp.core.internal.contentmodel.TaglibController$FileBufferListener.bufferCreated(org.eclipse.core.filebuffers.IFileBuffer) line: 118
	-> successfully synchronizes around TaglibController's document map
	-> wants to add an object as a listener to the TaglibIndex

Worker thread 8 : org.eclipse.jst.jsp.core.taglib.TaglibIndex.addTaglibIndexListener(org.eclipse.jst.jsp.core.taglib.ITaglibIndexListener) line: 295
	-> TaglibIndex tries to acquire its generic ILock from this thread (waits on worker thread 2)

Worker thread 2 : org.eclipse.jst.jsp.core.internal.contentmodel.TaglibController.getFileBuffer(org.eclipse.jst.jsp.core.internal.contentmodel.tld.TLDCMDocumentManager) line: 231
	-> tries to synchronize around TaglibController's document map in this thread (waits on worker thread 8)


On Worker Thread 2, resource change event notification occurs.  TaglibIndex receives those notifications, acquire its ILock, and begins processes deltas
Meanwhile, over in Worker Thread 8, TaglibController$FileBufferListener.bufferCreated() is called when a model is loaded by someone and the TaglibController then synchronizes the handling around its hashmap.  While synchronized on Worker Thread 8, the TaglibController calls TaglibIndex.addTaglibIndexListener()--which attempts to acquire TaglibIndex's ILock.  Worker Thread 8 waits for Worker Thread 2 to release the ILock.
Back on Worker Thread 2, the TaglibIndex is still processing deltas, but something eventually finds that it needs to call TaglibController.getFileBuffer().  That method tries to synchronize around the TaglibController's hashmap and ends up waiting for Worker Thread 8 to exit its synchronized block, which doesn't happen.

The patch changes the behavior seen in worker thread 8, where the TaglibController synchronizes around its hashmap while calling into the TaglibIndex.  Calling methods unrelated to the hashmap while synchronizing around it is unsafe.  The patch updates the org.eclipse.jst.jsp.core.internal.contentmodel.TaglibController.FileBufferListener.bufferCreated(IFileBuffer) shown in the stack traces as well as the corresponding bufferDisposed(IFileBuffer) method (since it would be subject to the same problem from a structured model release), so that calls to the taglib index occur only after the synchronized block is exited.

Comment 5 Nitin Dahyabhai

2006-06-14 02:50:53 EDT

CCing Min Idzelis as he brought this to my attention, and he might have more comments since it's practically a blocking problem for him.

Comment 6 David Williams

2006-06-14 04:29:35 EDT

+1 

Nitin and I have been through this with a fine-tooth comb. And I am comfortable that is it not only the correct fix, but will not even be one of those cases that cases threading problems to pop up elsewhere. The reason for that is that this case of "over syncronizing" was broken up and the key-trouble making tasks were still synchronized, just in a slightly different way, being gaurded by variables appropriate to gaurd themm, to minimize contention. So, in other words, this should not "radically change threading behavior" as I would usually argue at this late date. 

Our 'client' has been using a patched version "for a while" without seeing a hang, whereas he had been seeing them "all the time" before. So, I am confident Nitin has nailed it.

Comment 7 Arthur Ryman

2006-06-14 15:26:22 EDT

+1 for WTP 1.5 RC5 on the condition that we add a stress test asap.

Comment 8 Tim Wagner

2006-06-14 15:26:35 EDT

+1 1.5

Comment 9 Nitin Dahyabhai

2006-06-15 00:55:21 EDT

Committed and released for RC5.

Comment 10 Nitin Dahyabhai

2006-06-15 00:59:35 EDT

Bug 147113 was opened to pursue greater stress load and multi-threading tests.

Comment 11 Nitin Dahyabhai

2006-06-21 17:37:22 EDT

Verified in WTP 1.5 RC5a

Comment 12 Nitin Dahyabhai

2006-06-29 14:27:30 EDT

Closing