Bug 417255 - [Decorators] Race Condition in org.eclipse.ui.internal.decorators.DecorationScheduler
Summary: [Decorators] Race Condition in org.eclipse.ui.internal.decorators.DecorationS...
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 4.2   Edit
Hardware: PC Windows Vista
: P3 minor (vote)
Target Milestone: 4.13 M3   Edit
Assignee: Stefan Winkler CLA
QA Contact:
URL:
Whiteboard:
Keywords: needinfo
Depends on:
Blocks:
 
Reported: 2013-09-14 07:07 EDT by Thomas Krieger CLA
Modified: 2019-07-12 09:14 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Krieger CLA 2013-09-14 07:07:36 EDT
The method DecorationScheduler.internalPutResult is accessed from different thread and modifies a hash map without synchronization. This leads to a race condition:

See trace generated by vmlens.com:

java/util/HashMap.size@8484253 (7,7)
   Worker-7 (35)  

   Worker-5 (30)  java.util.HashMap.createEntry
      java.util.HashMap.createEntry
      java.util.HashMap.addEntry
      java.util.HashMap.put
      org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.run
      org.eclipse.core.internal.jobs.Worker.run

   Worker-6 (34)  java.util.HashMap.createEntry
      java.util.HashMap.createEntry
      java.util.HashMap.addEntry
      java.util.HashMap.put
      org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.run
      org.eclipse.core.internal.jobs.Worker.run

   Worker-2 (27)  java.util.HashMap.createEntry
      java.util.HashMap.createEntry
      java.util.HashMap.addEntry
      java.util.HashMap.put
      org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.run
      org.eclipse.core.internal.jobs.Worker.run

   Worker-0 (23)  java.util.HashMap.createEntry
      java.util.HashMap.createEntry
      java.util.HashMap.addEntry
      java.util.HashMap.put
      org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.run
      org.eclipse.core.internal.jobs.Worker.run

   Worker-8 (53)  java.util.HashMap.createEntry
      java.util.HashMap.createEntry
      java.util.HashMap.addEntry
      java.util.HashMap.put
      org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.run
      org.eclipse.core.internal.jobs.Worker.run

   Worker-10 (68)  java.util.HashMap.createEntry
      java.util.HashMap.createEntry
      java.util.HashMap.addEntry
      java.util.HashMap.put
      org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached
      org.eclipse.ui.internal.decorators.DecorationScheduler$1.run
      org.eclipse.core.internal.jobs.Worker.run
Comment 1 Paul Webster CLA 2013-09-20 14:06:51 EDT
What did you do to hit this kind of situation?  

I can only get the DecorationScheduler job to run in one Worker-* thread at a time.

PW
Comment 2 Thomas Krieger CLA 2013-09-24 05:40:25 EDT
Hi,

"I can only get the DecorationScheduler job to run in one Worker-* thread at a time."

might be correct. Even if different Worker-* threads are accessing the unsynchronized HashMap only one at a time this will lead to visibility bugs:

Visibility: The cpu caches values from the main memory. Therefore a value written by thread running on the first cpu will not be seen by a thread running on the second cpu without cache invalidation. In java the synchronized statement, volatile fields and classes in the package java.util.concurrent. invalidate the cpu cache.

(see for example <a href="http://blog.vmlens.com/2013/08/18/java-race-conditions-or-how-to-find-an-irreproducable-bug/">java-race-conditions-or-how-to-find-an-irreproducable-bug</a>
or search for java memory model)

Regards
Thomas
Comment 3 Yilong Li CLA 2015-08-20 07:32:39 EDT
(In reply to Paul Webster from comment #1)
> What did you do to hit this kind of situation?  
> 
> I can only get the DecorationScheduler job to run in one Worker-* thread at
> a time.
> 
> PW

This field can be accessed from the main thread as well. Here is the race report given by a dynamic race detector:
Data race on field java.util.HashMap.$state: {{{
    Concurrent read in thread T1 (locks held: {Monitor@410958f8})
 ---->  at org.eclipse.ui.internal.decorators.DecorationScheduler.internalGetResult(DecorationScheduler.java:222)
        at org.eclipse.ui.internal.decorators.DecorationScheduler.isDecorationReady(DecorationScheduler.java:591)
        at org.eclipse.ui.internal.decorators.DecoratorManager.prepareDecoration(DecoratorManager.java:1053)
        at org.eclipse.ui.internal.decorators.DecoratorManager$ManagedWorkbenchLabelDecorator.prepareDecoration(DecoratorManager.java:157)
        at org.eclipse.jface.viewers.DecoratingStyledCellLabelProvider.waitForPendingDecoration(DecoratingStyledCellLabelProvider.java:119)
        at org.eclipse.jface.viewers.DecoratingStyledCellLabelProvider.update(DecoratingStyledCellLabelProvider.java:133)
        at org.eclipse.jface.viewers.ViewerColumn.refresh(ViewerColumn.java:154)
        at org.eclipse.jface.viewers.AbstractTreeViewer.doUpdateItem(AbstractTreeViewer.java:949)
        at org.eclipse.jface.viewers.AbstractTreeViewer$UpdateItemSafeRunnable.run(AbstractTreeViewer.java:114)
        at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
        at org.eclipse.ui.internal.JFaceUtil$1.run(JFaceUtil.java:50)
        at org.eclipse.jface.util.SafeRunnable.run(SafeRunnable.java:173)
        at org.eclipse.jface.viewers.AbstractTreeViewer.doUpdateItem(AbstractTreeViewer.java:1029)
        at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefresh(AbstractTreeViewer.java:1884)
        at org.eclipse.jface.viewers.AbstractTreeViewer.internalRefresh(AbstractTreeViewer.java:1848)
        at org.eclipse.jdt.internal.ui.packageview.PackageExplorerPart$PackageExplorerProblemTreeViewer.internalRefresh(PackageExplorerPart.java:265)
        at org.eclipse.jface.viewers.StructuredViewer$8.run(StructuredViewer.java:1554)
        at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1462)
        at org.eclipse.jface.viewers.TreeViewer.preservingSelection(TreeViewer.java:366)
        at org.eclipse.jface.viewers.StructuredViewer.preservingSelection(StructuredViewer.java:1423)
        at org.eclipse.jface.viewers.StructuredViewer.refresh(StructuredViewer.java:1551)
        at org.eclipse.jface.viewers.ColumnViewer.refresh(ColumnViewer.java:534)
        at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider$3.run(PackageExplorerContentProvider.java:939)
        at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider.runUpdates(PackageExplorerContentProvider.java:193)
        at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider.executeRunnables(PackageExplorerContentProvider.java:141)
        at org.eclipse.jdt.internal.ui.packageview.PackageExplorerContentProvider.elementChanged(PackageExplorerContentProvider.java:126)
        at org.eclipse.jdt.internal.core.DeltaProcessor$3.run(DeltaProcessor.java:1662)
        at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
        at org.eclipse.jdt.internal.core.DeltaProcessor.notifyListeners(DeltaProcessor.java:1652)
        at org.eclipse.jdt.internal.core.DeltaProcessor.firePostChangeDelta(DeltaProcessor.java:1486)
        at org.eclipse.jdt.internal.core.DeltaProcessor.fire(DeltaProcessor.java:1462)
        at org.eclipse.jdt.internal.core.DeltaProcessor.resourceChanged(DeltaProcessor.java:2074)
        at org.eclipse.jdt.internal.core.DeltaProcessingState.resourceChanged(DeltaProcessingState.java:477)
        at org.eclipse.core.internal.events.NotificationManager$1.run(NotificationManager.java:299)
        at org.eclipse.core.runtime.SafeRunner.run(SafeRunner.java:42)
        at org.eclipse.core.internal.events.NotificationManager.notify(NotificationManager.java:289)
        at org.eclipse.core.internal.events.NotificationManager.broadcastChanges(NotificationManager.java:152)
        at org.eclipse.core.internal.resources.Workspace.broadcastPostChange(Workspace.java:374)
        at org.eclipse.core.internal.resources.Workspace.endOperation(Workspace.java:1471)
        at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2251)
        at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2226)
        at org.eclipse.jdt.testplugin.JavaProjectHelper.clear(JavaProjectHelper.java:400)
        at org.eclipse.jdt.text.tests.PropertiesFileAutoEditStrategyTest.tearDown(PropertiesFileAutoEditStrategyTest.java:97)
        at junit.framework.TestCase.runBare(TestCase.java:146)
        at junit.framework.TestResult$1.protect(TestResult.java:122)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.framework.TestResult.run(TestResult.java:125)
        at junit.framework.TestCase.run(TestCase.java:129)
        at junit.framework.TestSuite.runTest(TestSuite.java:252)
        at junit.framework.TestSuite.run(TestSuite.java:247)
        at junit.extensions.TestDecorator.basicRun(TestDecorator.java:23)
        at junit.extensions.TestSetup$1.protect(TestSetup.java:23)
        at junit.framework.TestResult.runProtected(TestResult.java:142)
        at junit.extensions.TestSetup.run(TestSetup.java:27)
        at junit.framework.TestSuite.runTest(TestSuite.java:252)
        at junit.framework.TestSuite.run(TestSuite.java:247)
        at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:692)
        at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:319)
        at org.eclipse.test.UITestApplication$2.run(UITestApplication.java:197)
        at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35)
        at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:135)
        - locked Monitor@410958f8 at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:131) 
        at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3794)
        at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3433)
        at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine$4.run(PartRenderingEngine.java:1127)
        at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:336)
        at org.eclipse.e4.ui.internal.workbench.swt.PartRenderingEngine.run(PartRenderingEngine.java:1018)
        at org.eclipse.e4.ui.internal.workbench.E4Workbench.createAndRunUI(E4Workbench.java:157)
        at org.eclipse.ui.internal.Workbench$5.run(Workbench.java:654)
        at org.eclipse.core.databinding.observable.Realm.runWithDefault(Realm.java:336)
        at org.eclipse.ui.internal.Workbench.createAndRunWorkbench(Workbench.java:598)
        at org.eclipse.ui.PlatformUI.createAndRunWorkbench(PlatformUI.java:150)
        at org.eclipse.ui.internal.ide.application.IDEApplication.start(IDEApplication.java:139)
        at org.eclipse.test.UITestApplication.runApplication(UITestApplication.java:140)
        at org.eclipse.test.UITestApplication.run(UITestApplication.java:62)
        at org.eclipse.test.UITestApplication.start(UITestApplication.java:212)
        at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:196)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:134)
        at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:104)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:388)
        at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:243)
        at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:669)
        at org.eclipse.equinox.launcher.Main.basicRun(Main.java:608)
        at org.eclipse.equinox.launcher.Main.run(Main.java:1515)
        at org.eclipse.equinox.launcher.Main.main(Main.java:1488)
        at org.eclipse.core.launcher.Main.main(Main.java:34)
    T1 is the main thread

    Concurrent write in thread T29 (locks held: {})
 ---->  at org.eclipse.ui.internal.decorators.DecorationScheduler.internalPutResult(DecorationScheduler.java:234)
        at org.eclipse.ui.internal.decorators.DecorationScheduler$1.ensureResultCached(DecorationScheduler.java:383)
        at org.eclipse.ui.internal.decorators.DecorationScheduler$1.run(DecorationScheduler.java:327)
        at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
    T29 is created by T23
        at org.eclipse.core.internal.jobs.WorkerPool.jobQueued(WorkerPool.java:148)

This bug is still present in I-Build I20150811-1400.
Comment 4 Stefan Winkler CLA 2019-06-21 09:59:31 EDT
I think I have come over an even more serious manifestation of this bug/race condition.
If we change the data on which the decoration is based and call TreeViewer.update() between a finished decoration but before the UpdateJob runs, the item is never decorated with the newest version.

In more detail:

Leta say we have an element E in a TreeViewer/CommonViewer and a lightweight decorator active which can decorate E based on some property E.prop.

*State "No Decoration queued"*

If we do
  E.prop = "value1"
and call
  treeViewer.update(E, null);
this will call DecorationManager.prepareDecoration and DecorationScheduler.queueForDecoration and the decoration is queued now.

*State "Decoration queued"*

If we call 
  E.prop = "value2"
and call 
  treeViewer.update(E, null);
now (again), this will be ok, because DecorationScheduler.queueForDecoration will still call DecorationScheduler.queueForDecoration which will detect that the decoration is already queued and hence does not need to be queued again. 

*State "Decoration processing"*

When the decoration job runs, it removes the decoration request from the queue, and calls the decorator.

If we call 
  E.prop = "value3"
and call 
  treeViewer.update(E, null);
now (again) at this point, DecorationManager.prepareDecoration will call DecorationScheduler.queueForDecoration as before. Since the decoration request has already been removed from the queue, it is queued again.

The decoration job meanwhile continues and stores the result of the original request in the resultCache ("value2") and enqueues the update request.

The decoration job will process the new request again during the same run, but the result is lost, because in ensureResultCached, it is made sure that the update request is enqueued, but the result in the cache is not replaced. (<<< BUG! TestCase testBug417255raceConditionDuringDecoration)

When the decoration job finishes because the decoration request queue is empty, it schedules the update job.

*State "Decoration processed"*

In this state, the decoration job has finished and the update job is scheduled, but has not yet run.

If we call 
  E.prop = "value4"
and call 
  treeViewer.update(E, null);
now (again), DecorationManager.prepareDecoration will in this case NOT call DecorationScheduler.queueForDecoration, because DecorationScheduler.isDecorationReady() is true here. The request never gets scheduled and the new "value4" will never become visible. (<<< BUG! TestCase testBug417255raceConditionBeforeUpdate).

*-- SUMMARY --*

In my opinion we have three issues:

First, as described in comment#2, there is a small chance that two workers running in quick succession on different CPUs execute DecorationScheduler.internalPutResult without CPU local cache invalidation which can lead to erroneous results in rare (or even theoretical) cases.

Second, if the decorator is currently decorating an element, and concurrently, the data for that element is changed and an update is requested, the second decoration result is lost, because ensureResultCached does not replace a newer decoration result.

Third, if the decorator job has finished, but the update job has not yet run, and then the data for that element is changed and an update is requested, the decoration request is not scheduled and the decoration will never occur. The reason for this is that queueForDecoration is called in two scenarios: (A) by the applicatio code to schedule a decoration after a data update, and (B) by the decoration update job to fetch and update the decoration result. But, unfortunately, queueForDecoration cannot decide who is calling and just assumes that if it is called if a decoration result is ready, then it must be the update job calling.

I propose to address the first issue by using a ConcurrentHashMap for the DecorationScheduler.resultCache. The second issue should be addressed by replacing/updating an existing decoration result. And the third issue could be solved by adding some kind of state indicating whether the update job logic is currently executing and if not, the decoration is always enqueued.

Changeset with Testcase incoming ...
Comment 5 Eclipse Genie CLA 2019-06-21 10:34:33 EDT
New Gerrit change created: https://git.eclipse.org/r/144642
Comment 6 Eclipse Genie CLA 2019-06-24 05:36:26 EDT
New Gerrit change created: https://git.eclipse.org/r/144707
Comment 8 Julian Honnen CLA 2019-07-12 09:14:18 EDT
Thanks, Stefan!