Community
Participate
Working Groups
The method DecorationScheduler.internalPutResult is accessed from different thread and modifies a hash map without synchronization. This leads to a race condition: See trace generated by vmlens.com: java/util/HashMap.size@8484253 (7,7) Worker-7 (35) Worker-5 (30) java.util.HashMap.createEntry java.util.HashMap.createEntry java.util.HashMap.addEntry java.util.HashMap.put org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached org.eclipse.ui.internal.decorators.DecorationScheduler$1.run org.eclipse.core.internal.jobs.Worker.run Worker-6 (34) java.util.HashMap.createEntry java.util.HashMap.createEntry java.util.HashMap.addEntry java.util.HashMap.put org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached org.eclipse.ui.internal.decorators.DecorationScheduler$1.run org.eclipse.core.internal.jobs.Worker.run Worker-2 (27) java.util.HashMap.createEntry java.util.HashMap.createEntry java.util.HashMap.addEntry java.util.HashMap.put org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached org.eclipse.ui.internal.decorators.DecorationScheduler$1.run org.eclipse.core.internal.jobs.Worker.run Worker-0 (23) java.util.HashMap.createEntry java.util.HashMap.createEntry java.util.HashMap.addEntry java.util.HashMap.put org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached org.eclipse.ui.internal.decorators.DecorationScheduler$1.run org.eclipse.core.internal.jobs.Worker.run Worker-8 (53) java.util.HashMap.createEntry java.util.HashMap.createEntry java.util.HashMap.addEntry java.util.HashMap.put org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached org.eclipse.ui.internal.decorators.DecorationScheduler$1.run org.eclipse.core.internal.jobs.Worker.run Worker-10 (68) java.util.HashMap.createEntry java.util.HashMap.createEntry java.util.HashMap.addEntry java.util.HashMap.put org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached org.eclipse.ui.internal.decorators.DecorationScheduler$1.run org.eclipse.core.internal.jobs.Worker.run
What did you do to hit this kind of situation? I can only get the DecorationScheduler job to run in one Worker-* thread at a time. PW
Hi, "I can only get the DecorationScheduler job to run in one Worker-* thread at a time." might be correct. Even if different Worker-* threads are accessing the unsynchronized HashMap only one at a time this will lead to visibility bugs: Visibility: The cpu caches values from the main memory. Therefore a value written by thread running on the first cpu will not be seen by a thread running on the second cpu without cache invalidation. In java the synchronized statement, volatile fields and classes in the package java.util.concurrent. invalidate the cpu cache. (see for example <a href="http://blog.vmlens.com/2013/08/18/java-race-conditions-or-how-to-find-an-irreproducable-bug/">java-race-conditions-or-how-to-find-an-irreproducable-bug</a> or search for java memory model) Regards Thomas
(In reply to Paul Webster from comment #1) > What did you do to hit this kind of situation? > > I can only get the DecorationScheduler job to run in one Worker-* thread at > a time. > > PW This field can be accessed from the main thread as well. Here is the race report given by a dynamic race detector: Data race on field java.util.HashMap.$state: {{{ Concurrent read in thread T1 (locks held: {Monitor@410958f8}) ----> at org.eclipse.ui.internal.decorators.DecorationScheduler.internalGetResult(DecorationScheduler.java:222) at org.eclipse.ui.internal.decorators.DecorationScheduler.isDecorationReady(DecorationScheduler.java:591) at org.eclipse.ui.internal.decorators.DecoratorManager.prepareDecoration(DecoratorManager.java:1053) at org.eclipse.ui.internal.decorators.DecoratorManager$ManagedWorkbenchLabelDecorator.prepareDecoration(DecoratorManager.java:157) at org.eclipse.jface.viewers.DecoratingStyledCellLabelProvider.waitForPendingDecoration(DecoratingStyledCellLabelProvider.java:119) at org.eclipse.jface.viewers.DecoratingStyledCellLabelProvider.update(DecoratingStyledCellLabelProvider.java:133) at org.eclipse.jface.viewers.ViewerColumn.refresh(ViewerColumn.java:154) at org.eclipse.jface.viewers.AbstractTreeViewer.doUpdateItem(AbstractTreeViewer.java:949) at org.eclipse.jface.viewers.AbstractTreeViewer$UpdateItemSafeRunnable.run(AbstractTreeViewer.java:114) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:50) at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:173) at org.eclipse.jface.viewers.AbstractTreeViewer.doUpdateItem(AbstractTreeViewer.java:1029) at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefresh(AbstractTreeViewer.java:1884) at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefresh(AbstractTreeViewer.java:1848) at org.eclipse.jdt.internal.ui.packageview.PackageExplorerPart$PackageExplorerProblemTreeViewer.internalRefresh(PackageExplorerPart.java:265) at org.eclipse.jface.viewers.StructuredViewer$8.run(StructuredViewer.java:1554) at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1462) at org.eclipse.jface.viewers.TreeViewer.preservingSelection(TreeViewer.java:366) at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1423) at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1551) at org.eclipse.jface.viewers.ColumnViewer.refresh(ColumnViewer.java:534) at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider$3.run(PackageExplorerContentProvider.java:939) at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider.runUpdates(PackageExplorerContentProvider.java:193) at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider.executeRunnables(PackageExplorerContentProvider.java:141) at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider.elementChanged(PackageExplorerContentProvider.java:126) at org.eclipse.jdt.internal.core.DeltaProcessor$3.run(DeltaProcessor.java:1662) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.jdt.internal.core.DeltaProcessor.notifyListeners(DeltaProcessor.java:1652) at org.eclipse.jdt.internal.core.DeltaProcessor.firePostChangeDelta(DeltaProcessor.java:1486) at org.eclipse.jdt.internal.core.DeltaProcessor.fire(DeltaProcessor.java:1462) at org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged(DeltaProcessor.java:2074) at org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged(DeltaProcessingState.java:477) at org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:299) at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42) at org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:289) at org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:152) at org.eclipse.core.internal.resources.Workspace.broadcastPostChange(Workspace.java:374) at org.eclipse.core.internal.resources.Workspace.endOperation(Workspace.java:1471) at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2251) at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2226) at org.eclipse.jdt.testplugin.JavaProjectHelper.clear(JavaProjectHelper.java:400) at org.eclipse.jdt.text.tests.PropertiesFileAutoEditStrategyTest.tearDown(PropertiesFileAutoEditStrategyTest.java:97) at junit.framework.TestCase.runBare(TestCase.java:146) at junit.framework.TestResult$1.protect(TestResult.java:122) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.framework.TestResult.run(TestResult.java:125) at junit.framework.TestCase.run(TestCase.java:129) at junit.framework.TestSuite.runTest(TestSuite.java:252) at junit.framework.TestSuite.run(TestSuite.java:247) at junit.extensions.TestDecorator.basicRun(TestDecorator.java:23) at junit.extensions.TestSetup$1.protect(TestSetup.java:23) at junit.framework.TestResult.runProtected(TestResult.java:142) at junit.extensions.TestSetup.run(TestSetup.java:27) at junit.framework.TestSuite.runTest(TestSuite.java:252) at junit.framework.TestSuite.run(TestSuite.java:247) at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:692) at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:319) at org.eclipse.test.UITestApplication$2.run(UITestApplication.java:197) at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:135) - locked Monitor@410958f8 at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:131) at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3794) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3433) at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$4.run(PartRenderingEngine.java:1127) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:336) at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1018) at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:157) at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:654) at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:336) at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:598) at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:150) at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:139) at org.eclipse.test.UITestApplication.runApplication(UITestApplication.java:140) at org.eclipse.test.UITestApplication.run(UITestApplication.java:62) at org.eclipse.test.UITestApplication.start(UITestApplication.java:212) at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:134) at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:388) at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:243) at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:669) at org.eclipse.equinox.launcher.Main.basicRun(Main.java:608) at org.eclipse.equinox.launcher.Main.run(Main.java:1515) at org.eclipse.equinox.launcher.Main.main(Main.java:1488) at org.eclipse.core.launcher.Main.main(Main.java:34) T1 is the main thread Concurrent write in thread T29 (locks held: {}) ----> at org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult(DecorationScheduler.java:234) at org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached(DecorationScheduler.java:383) at org.eclipse.ui.internal.decorators.DecorationScheduler$1.run(DecorationScheduler.java:327) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55) T29 is created by T23 at org.eclipse.core.internal.jobs.WorkerPool.jobQueued(WorkerPool.java:148) This bug is still present in I-Build I20150811-1400.
I think I have come over an even more serious manifestation of this bug/race condition. If we change the data on which the decoration is based and call TreeViewer.update() between a finished decoration but before the UpdateJob runs, the item is never decorated with the newest version. In more detail: Leta say we have an element E in a TreeViewer/CommonViewer and a lightweight decorator active which can decorate E based on some property E.prop. *State "No Decoration queued"* If we do E.prop = "value1" and call treeViewer.update(E, null); this will call DecorationManager.prepareDecoration and DecorationScheduler.queueForDecoration and the decoration is queued now. *State "Decoration queued"* If we call E.prop = "value2" and call treeViewer.update(E, null); now (again), this will be ok, because DecorationScheduler.queueForDecoration will still call DecorationScheduler.queueForDecoration which will detect that the decoration is already queued and hence does not need to be queued again. *State "Decoration processing"* When the decoration job runs, it removes the decoration request from the queue, and calls the decorator. If we call E.prop = "value3" and call treeViewer.update(E, null); now (again) at this point, DecorationManager.prepareDecoration will call DecorationScheduler.queueForDecoration as before. Since the decoration request has already been removed from the queue, it is queued again. The decoration job meanwhile continues and stores the result of the original request in the resultCache ("value2") and enqueues the update request. The decoration job will process the new request again during the same run, but the result is lost, because in ensureResultCached, it is made sure that the update request is enqueued, but the result in the cache is not replaced. (<<< BUG! TestCase testBug417255raceConditionDuringDecoration) When the decoration job finishes because the decoration request queue is empty, it schedules the update job. *State "Decoration processed"* In this state, the decoration job has finished and the update job is scheduled, but has not yet run. If we call E.prop = "value4" and call treeViewer.update(E, null); now (again), DecorationManager.prepareDecoration will in this case NOT call DecorationScheduler.queueForDecoration, because DecorationScheduler.isDecorationReady() is true here. The request never gets scheduled and the new "value4" will never become visible. (<<< BUG! TestCase testBug417255raceConditionBeforeUpdate). *-- SUMMARY --* In my opinion we have three issues: First, as described in comment#2, there is a small chance that two workers running in quick succession on different CPUs execute DecorationScheduler.internalPutResult without CPU local cache invalidation which can lead to erroneous results in rare (or even theoretical) cases. Second, if the decorator is currently decorating an element, and concurrently, the data for that element is changed and an update is requested, the second decoration result is lost, because ensureResultCached does not replace a newer decoration result. Third, if the decorator job has finished, but the update job has not yet run, and then the data for that element is changed and an update is requested, the decoration request is not scheduled and the decoration will never occur. The reason for this is that queueForDecoration is called in two scenarios: (A) by the applicatio code to schedule a decoration after a data update, and (B) by the decoration update job to fetch and update the decoration result. But, unfortunately, queueForDecoration cannot decide who is calling and just assumes that if it is called if a decoration result is ready, then it must be the update job calling. I propose to address the first issue by using a ConcurrentHashMap for the DecorationScheduler.resultCache. The second issue should be addressed by replacing/updating an existing decoration result. And the third issue could be solved by adding some kind of state indicating whether the update job logic is currently executing and if not, the decoration is always enqueued. Changeset with Testcase incoming ...
New Gerrit change created: https://git.eclipse.org/r/144642
New Gerrit change created: https://git.eclipse.org/r/144707
Gerrit change https://git.eclipse.org/r/144707 was merged to [master]. Commit: http://git.eclipse.org/c/platform/eclipse.platform.ui.git/commit/?id=47dc5102ff26ce1b7ddc3efec761eb1372e146aa
Thanks, Stefan!