Community
Participate
Working Groups
Deadlocks have been appearing intermittently when validation is being performed with lots of JSP models being opened and closed. The common factor appears to be within the JSP TaglibController class where one synchronized block of code on one field makes a call to another method in the same class that also contains a synchronized block of code for another field.
Created attachment 44360 [details] partial stack traces #1
Created attachment 44361 [details] partial stack traces #2
Created attachment 44363 [details] proposed patch Attaching a patch that changes the synchronized blocks so that only the field being synchronized on is manipulated. The synchronization was only introduced to make the underlying fields thread-safe, so there's really too much locking going on without the patch.
Created attachment 44380 [details] simplified patch without the ancillary cleanups Updating the patch so it's easier to evaluate. This problem is actually pretty hard to visualize, but in the first attached stack trace, all of the important behavior happens in worker threads 2 and 8 in the following methods (ordered chronologically): Worker thread 2 : org.eclipse.jst.jsp.core.taglib.TaglibIndex$ResourceChangeListener.resourceChanged(org.eclipse.core.resources.IResourceChangeEvent) line: 240 -> TaglibIndex acquires its generic ILock -> begins processing deltas, which takes a while Worker thread 8 : org.eclipse.jst.jsp.core.internal.contentmodel.TaglibController$FileBufferListener.bufferCreated(org.eclipse.core.filebuffers.IFileBuffer) line: 118 -> successfully synchronizes around TaglibController's document map -> wants to add an object as a listener to the TaglibIndex Worker thread 8 : org.eclipse.jst.jsp.core.taglib.TaglibIndex.addTaglibIndexListener(org.eclipse.jst.jsp.core.taglib.ITaglibIndexListener) line: 295 -> TaglibIndex tries to acquire its generic ILock from this thread (waits on worker thread 2) Worker thread 2 : org.eclipse.jst.jsp.core.internal.contentmodel.TaglibController.getFileBuffer(org.eclipse.jst.jsp.core.internal.contentmodel.tld.TLDCMDocumentManager) line: 231 -> tries to synchronize around TaglibController's document map in this thread (waits on worker thread 8) On Worker Thread 2, resource change event notification occurs. TaglibIndex receives those notifications, acquire its ILock, and begins processes deltas Meanwhile, over in Worker Thread 8, TaglibController$FileBufferListener.bufferCreated() is called when a model is loaded by someone and the TaglibController then synchronizes the handling around its hashmap. While synchronized on Worker Thread 8, the TaglibController calls TaglibIndex.addTaglibIndexListener()--which attempts to acquire TaglibIndex's ILock. Worker Thread 8 waits for Worker Thread 2 to release the ILock. Back on Worker Thread 2, the TaglibIndex is still processing deltas, but something eventually finds that it needs to call TaglibController.getFileBuffer(). That method tries to synchronize around the TaglibController's hashmap and ends up waiting for Worker Thread 8 to exit its synchronized block, which doesn't happen. The patch changes the behavior seen in worker thread 8, where the TaglibController synchronizes around its hashmap while calling into the TaglibIndex. Calling methods unrelated to the hashmap while synchronizing around it is unsafe. The patch updates the org.eclipse.jst.jsp.core.internal.contentmodel.TaglibController.FileBufferListener.bufferCreated(IFileBuffer) shown in the stack traces as well as the corresponding bufferDisposed(IFileBuffer) method (since it would be subject to the same problem from a structured model release), so that calls to the taglib index occur only after the synchronized block is exited.
CCing Min Idzelis as he brought this to my attention, and he might have more comments since it's practically a blocking problem for him.
+1 Nitin and I have been through this with a fine-tooth comb. And I am comfortable that is it not only the correct fix, but will not even be one of those cases that cases threading problems to pop up elsewhere. The reason for that is that this case of "over syncronizing" was broken up and the key-trouble making tasks were still synchronized, just in a slightly different way, being gaurded by variables appropriate to gaurd themm, to minimize contention. So, in other words, this should not "radically change threading behavior" as I would usually argue at this late date. Our 'client' has been using a patched version "for a while" without seeing a hang, whereas he had been seeing them "all the time" before. So, I am confident Nitin has nailed it.
+1 for WTP 1.5 RC5 on the condition that we add a stress test asap.
+1 1.5
Committed and released for RC5.
Bug 147113 was opened to pursue greater stress load and multi-threading tests.
Verified in WTP 1.5 RC5a
Closing