Bug 521946 - Saving process in ResourceSaveDiagnose should be atomic
Summary: Saving process in ResourceSaveDiagnose should be atomic
Status: REOPENED
Alias: None
Product: Sirius
Classification: Modeling
Component: Core (show other bugs)
Version: 5.0.0   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: Next   Edit
Assignee: Project inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: triaged
Depends on:
Blocks:
 
Reported: 2017-09-06 12:44 EDT by Steve Monnier CLA
Modified: 2018-09-07 10:38 EDT (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Steve Monnier CLA 2017-09-06 12:44:14 EDT
ResourceSaveDiagnose.hasDifferentSerialization(Map<?, ?>) starts by setting  resource.eSetDeliver to false before saving. In a finally block it is set back to true. This is done asynchronously in a different thread and this can cause some missing notification. For instance if this is executed during a representation creation, the adapters of the resource will not be added to the diagram. This is happens rarely but cause bugs that are hard to detect and reproduce. This saving needs to be done atomically.
Comment 1 Eclipse Genie CLA 2017-09-06 12:54:38 EDT
New Gerrit change created: https://git.eclipse.org/r/104591
Comment 3 Eclipse Genie CLA 2017-09-11 04:41:43 EDT
New Gerrit change created: https://git.eclipse.org/r/104831
Comment 4 Pierre-Charles David CLA 2017-09-15 03:45:06 EDT
The patch that was merged can trigger deadlocks in some of our tests:

"Worker-11" #128 prio=5 os_prio=0 tid=0x08d37400 nid=0x776 in Object.wait() [0x9d5fe000]
   java.lang.Thread.State: WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:502)
	at org.eclipse.core.internal.jobs.ThreadJob.waitForRun(ThreadJob.java:270)
	- locked <0xd2193328> (a java.lang.Object)
	at org.eclipse.core.internal.jobs.ThreadJob.joinRun(ThreadJob.java:197)
	at org.eclipse.core.internal.jobs.ImplicitJobs.begin(ImplicitJobs.java:92)
	at org.eclipse.core.internal.jobs.JobManager.beginRule(JobManager.java:307)
	at org.eclipse.core.internal.resources.WorkManager.checkIn(WorkManager.java:121)
	at org.eclipse.core.internal.resources.Workspace.prepareOperation(Workspace.java:2188)
	at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2235)
	at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2262)
	at org.eclipse.sirius.business.internal.session.danalysis.Saver.saveNow(Saver.java:116)
	at org.eclipse.sirius.business.internal.session.danalysis.Saver.save(Saver.java:93)
	at org.eclipse.sirius.business.internal.session.danalysis.DAnalysisSessionImpl.save(DAnalysisSessionImpl.java:801)
	at org.eclipse.sirius.business.internal.session.danalysis.DAnalysisSessionImpl.save(DAnalysisSessionImpl.java:796)
	at org.eclipse.sirius.business.internal.session.danalysis.SaveSessionJob.run(SaveSessionJob.java:67)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)

"main" #1 prio=6 os_prio=0 tid=0x0829ec00 nid=0x6ed waiting on condition [0xf6721000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
	at java.lang.Thread.sleep(Native Method)
	at org.eclipse.core.internal.jobs.JobManager.join(JobManager.java:1002)
	at org.eclipse.sirius.business.api.session.DefaultLocalSessionCreationOperation.execute(DefaultLocalSessionCreationOperation.java:78)
	at org.eclipse.sirius.tests.support.api.SiriusTestCase$2.execute(SiriusTestCase.java:505)
	at org.eclipse.ui.actions.WorkspaceModifyOperation.lambda$0(WorkspaceModifyOperation.java:107)
	at org.eclipse.ui.actions.WorkspaceModifyOperation$$Lambda$81/27984841.run(Unknown Source)
	at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2240)
	at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2267)
	at org.eclipse.ui.actions.WorkspaceModifyOperation.run(WorkspaceModifyOperation.java:128)
	- locked <0xd603d820> (a org.eclipse.sirius.tests.support.api.SiriusTestCase$2)
	at org.eclipse.sirius.tests.support.api.SiriusTestCase.createSession(SiriusTestCase.java:513)
	at org.eclipse.sirius.tests.support.api.SiriusTestCase.createOrLoadAndOpenSession(SiriusTestCase.java:473)
	at org.eclipse.sirius.tests.support.api.SiriusTestCase.genericSetUp(SiriusTestCase.java:418)
	at org.eclipse.sirius.tests.support.api.SiriusTestCase.genericSetUp(SiriusTestCase.java:347)
	at org.eclipse.sirius.tests.support.api.SiriusTestCase.genericSetUp(SiriusTestCase.java:321)
	at org.eclipse.sirius.tests.unit.api.session.SessionManagerListener2Tests.setUp(SessionManagerListener2Tests.java:94)

Reopening for further analysis (maybe the problem is in the test code, maybe not).
Comment 5 Pierre-Charles David CLA 2017-09-15 10:36:18 EDT
The patch that was merged is not a good fix (maybe it changed some timings and made it look like the bug was fixed) and will need to be reverted, as it is clearly the cause of the deadlock.

After more analysis, our current understanding of what happens when the bug occurs:

1. In the main thread, some code opens a session S. Initially, all the resources in the session are clean (SYNC).
2. The code makes some changes to the session models (elements from the .aird resources), for example it enables a new viewpoint. This is done in a transaction, and thus triggers all the post-commit listeners.
2.1. One of those listeners is ResourceSetSync, whose job is to track the stage of all the ressources in the session and trigger the appropriate events to make sure everything is synchronized. In this case, it is notified of the changes that occured during the transaction that just finished, and determines that this impacts the session's aird file. It was initially SYNC, but the changes transition it to CHANGED. It notifies its own clients/listeners of this state transition.
2.2. One of these clients/listeners is SaveSessionWhenNoDialectEditorsListener. When the PREF_SAVE_WHEN_NO_EDITOR is enabled (which is the default), if some resource transitions from SYNC to CHANGED, and if no Sirius editors are currently opened, it will automatically save the session. In the problematic scenario, this is the case (default value for the preference, no editors currently opened), so SaveSessionWhenNoDialectEditorsListener decides to save the session. To avoid blocking the current thread (which most of the times will be the UI thread) while saving is in progress, this is done in a SaveSessionJob which is launched in a separate thread.
3. As soon as we've launched the SaveSessionJob, the main thread continues. SaveSessionWhenNoDialectEditorsListener exits, then ResourceSetSync exits, then all the other post-commit listeners, and soon the main thread is back to the starting point, after the command in step 2 that triggered the model change (viewpoint enabled).
4. In the other thread, SaveSessionJob has launched a Session.save(), which after some indirections ends up in ResourceSaveDiagnose. To decide if a resource really needs to be saved, we perform a first Resource.save() to a temporary file (for reasons which are outside the scope of this scenario). During the temporary save, we disable the delivery of notifications from the resource (eSetDeliver(false)) to hide this intermediate step from the rest of the system. This means there is a time window where the aird resource has its notifications disabled.
5. In the main thread, after we've enabled the viewpoint (at step 2), say we create a new representation. This will instanciate a DRepresentation and, since Sirius 4.1, add it at the root of the resource. If this happens during the time windows where the other thread has disabled notificitions on the resource, all the adapters installed on the AirdResourceImpl will not be notified of the addition, and in particular all ContentAdapters and equivalent (e.g. the SessionLazyCrossReferencer) will not install themselves on the new element, which causes all kinds of issues afterwards.
Comment 6 Eclipse Genie CLA 2017-09-15 10:40:09 EDT
New Gerrit change created: https://git.eclipse.org/r/105224
Comment 8 Steve Monnier CLA 2017-09-15 11:08:39 EDT
Note that it is not limited of having the session saved automatically by SaveSessionWhenNoDialectEditorsListener when there is no opened editor. The issue can be reproduced with a representation opened:
- Set a breakpoint in org.eclipse.sirius.business.internal.session.danalysis.ResourceSaveDiagnose.hasDifferentSerialization(Map<?, ?>) after resourcetoSave.eSetDeliver(false)
- Launch a runtime in debug mode
- Open a diagram and move an element
- Save. It will stop on your breakpoint (keep it paused)
- Create a new diagram
-> Diagram elements will be stacked on (0,0) and will not be movable.
Comment 9 Pierre-Charles David CLA 2017-09-20 10:46:56 EDT
Fixing this properly will require rather deep changes that are too risky to be included in 5.1.
Comment 10 Eclipse Genie CLA 2017-10-01 14:31:53 EDT
New Gerrit change created: https://git.eclipse.org/r/106064
Comment 11 Laurent Redor CLA 2017-10-02 03:30:18 EDT
The problem is here since the bug 494766, since this bug the representation is added at the root of the representations file.

I test to launch 25 times the class org.eclipse.sirius.tests.unit.diagram.modeler.uml.PortLocationTest:
* On commit c6af24d0a376e87a3b02bead47b3772bfc871bf6 [1], there is no problem: 150 OK / 150
* On commit 43c3706635f9f396a881650d94c01589782149d8 [2], there are several problems: 48 errors and 97 failures / 150.

[1] http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=c6af24d0a376e87a3b02bead47b3772bfc871bf6
[2] http://git.eclipse.org/c/sirius/org.eclipse.sirius.git/commit/?id=43c3706635f9f396a881650d94c01589782149d8
Comment 12 Laurent Redor CLA 2017-10-02 03:40:13 EDT
(In reply to Laurent Redor from comment #11)
> The problem is here since the bug 494766, since this bug the representation
> is added at the root of the representations file.

To be more precise, the problem can occurs since bug 494766 only for representation files. It was already the case before, for semantic resources which allows to have several root elements.