Bug 551744 - PartOnTopManagerTest.test_PlaceholderOnTopStackSwitch is flaky
Summary: PartOnTopManagerTest.test_PlaceholderOnTopStackSwitch is flaky
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: UI (show other bugs)
Version: 4.14   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 4.14 RC1   Edit
Assignee: Rolf Theunissen CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 511027 511697 (view as bug list)
Depends on:
Blocks:
 
Reported: 2019-10-03 11:45 EDT by Karsten Thoms CLA
Modified: 2019-11-22 05:05 EST (History)
7 users (show)

See Also:
akurtakov: pmc_approved+


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Karsten Thoms CLA 2019-10-03 11:45:56 EDT
The test case 
  org.eclipse.e4.ui.tests.workbench.PartOnTopManagerTest.test_PlaceholderOnTopStackSwitch

fails sometimes. It was observed on two unrelated changes recently, and gone away after retriggering. Failed in
   https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/20318/ , change https://git.eclipse.org/r/150491
   https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/20305/ , change https://git.eclipse.org/r/#/c/150300/

We should investigate this.
Comment 1 Lars Vogel CLA 2019-11-12 07:53:17 EST
Jens, you recently stabilized several tests, can you have a look? This one fails frequently in the last days, e.g., https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/20783/testReport/
Comment 2 Jens Lideström CLA 2019-11-13 02:30:28 EST
(In reply to Lars Vogel from comment #1)
> Jens, you recently stabilized several tests, can you have a look?

Okay, I'll have a look in a few days.
Comment 3 Jens Lideström CLA 2019-11-13 14:51:16 EST
The test doesn't fail in any of the currently available integration tests (I20190909-1520 to I20191112-1800) so it's hard to figure out what's going wrong.

Next time anyone sees it, please post a stack trace to this ticket!
Comment 4 Eclipse Genie CLA 2019-11-13 16:48:16 EST
New Gerrit change created: https://git.eclipse.org/r/152620
Comment 5 Lars Vogel CLA 2019-11-13 17:17:52 EST
java.lang.AssertionError: CleanupAddon should ensure that partStack is not rendered anymore, as all childs have been removed
	at org.eclipse.e4.ui.tests.workbench.PartRenderingEngineTests.ensureCleanUpAddonCleansUp(PartRenderingEngineTests.java:2479)
Comment 6 Lars Vogel CLA 2019-11-13 17:18:56 EST
(In reply to Lars Vogel from comment #5)
> java.lang.AssertionError: CleanupAddon should ensure that partStack is not
> rendered anymore, as all childs have been removed
> 	at
> org.eclipse.e4.ui.tests.workbench.PartRenderingEngineTests.
> ensureCleanUpAddonCleansUp(PartRenderingEngineTests.java:2479)

See https://ci.eclipse.org/platform/job/eclipse.platform.ui-Gerrit/20856/
Comment 7 Jens Lideström CLA 2019-11-15 03:31:01 EST
The easy fix I tried to stabilize this test didn't work. I'll look at it a little more when I have time, but I don't know if I can do anything about it.

Since the fix didn't work that means that it took the code more than 10 s to fulfil the properties that were being tested. That suggests that either the test environment is really shaky, or that there is some real problem with the functionallity that is being tested.
Comment 8 Rolf Theunissen CLA 2019-11-17 10:24:56 EST
I managed to catch the failure at the first 'assertTrue(isPartOnTop(part))' in the test. 
It shows that the IWorkbench.ON_TOP key is not set in the context, so the part cannot be on top. The key is set either when the widget is added to a part (bootstrapping part of the PartOnTopManager), or when the selection changes of a container. The part for which the test fails is added with a PlaceHolder in the perspective.

So the question is why the ON_TOP key is not set, it should have been set when the widget was added. This could only happen if the widget was added before a context was added or the widget added event was not submitted/processed yet.
Comment 9 Andrey Loskutov CLA 2019-11-19 08:27:49 EST
Guys, how do you run this test locally from IDE? I can't, it explodes with hundreds of opened shells and errors like bug 511697 comment 2.
Comment 10 Andrey Loskutov CLA 2019-11-19 08:31:47 EST
(In reply to Andrey Loskutov from comment #9)
> Guys, how do you run this test locally from IDE? I can't, it explodes with
> hundreds of opened shells and errors like bug 511697 comment 2.

Same with PartRenderingEngineTests. Do I have to close or open some "special"  projects to be able to run this test successfully? I have a lot of SDK in the workspace.
Comment 11 Rolf Theunissen CLA 2019-11-19 08:36:20 EST
(In reply to Andrey Loskutov from comment #9)
> Guys, how do you run this test locally from IDE? I can't, it explodes with
> hundreds of opened shells and errors like bug 511697 comment 2.

There is a launch target pre-defined that works for the E4 tests: org.eclipse.e4.ui.tests\UIAllTests.launch.

I think that you have to run in 'Headless Mode' (i.e. Run an application: No Application)
Comment 12 Andrey Loskutov CLA 2019-11-19 09:48:56 EST
(In reply to Rolf Theunissen from comment #11)
> I think that you have to run in 'Headless Mode' (i.e. Run an application: No
> Application)

Thanks, this was it. The test runs now fine on my Linux box, even with stress-ng eating all CPU :-(

So I assume our test instance is disturbed by other means.
Comment 13 Eclipse Genie CLA 2019-11-19 09:53:40 EST
New Gerrit change created: https://git.eclipse.org/r/152964
Comment 14 Andrey Loskutov CLA 2019-11-19 09:55:14 EST
*** Bug 511697 has been marked as a duplicate of this bug. ***
Comment 15 Andrey Loskutov CLA 2019-11-19 09:56:13 EST
*** Bug 511027 has been marked as a duplicate of this bug. ***
Comment 16 Rolf Theunissen CLA 2019-11-19 10:29:02 EST
(In reply to Andrey Loskutov from comment #12)
> (In reply to Rolf Theunissen from comment #11)
> > I think that you have to run in 'Headless Mode' (i.e. Run an application: No
> > Application)
> 
> Thanks, this was it. The test runs now fine on my Linux box, even with
> stress-ng eating all CPU :-(
> 
> So I assume our test instance is disturbed by other means.

I was able to trigger the failures only by running the whole testsuite (on Windows), so it might also be a bad interaction with a previous testcase.

As mentioned in comment 8, it also depends on the order in which selection-changed-event is handled by the PartOnTopManager and the Renderers.
I also wonder if the widget-event is propagated for a shared part. Or that this event is propagated when the context is not available yet.
Given that the key is not in the context yet, all PartOnTopManger event handlers run before the context is available.
Comment 17 Rolf Theunissen CLA 2019-11-19 11:53:27 EST
Been searching some more, somehow the event subscriptions on PartOnTopManager are not always triggered, even when the SWT eventloop is spinned. Sometimes they initially work, and fail after the second 'setSelectedElement' in the test.

What can suppress or delay these events?
Comment 18 Eclipse Genie CLA 2019-11-20 03:06:48 EST
New Gerrit change created: https://git.eclipse.org/r/153016
Comment 19 Andrey Loskutov CLA 2019-11-20 05:31:48 EST
(In reply to Rolf Theunissen from comment #17)
> Been searching some more, somehow the event subscriptions on
> PartOnTopManager are not always triggered, even when the SWT eventloop is
> spinned. Sometimes they initially work, and fail after the second
> 'setSelectedElement' in the test.
> 
> What can suppress or delay these events?

(In reply to Eclipse Genie from comment #18)
> New Gerrit change created: https://git.eclipse.org/r/153016

So I've moved the test to be the first one in the test suite, to check if any other tests may influence the result - still failing.

java.lang.AssertionError
	at org.eclipse.e4.ui.tests.workbench.PartOnTopManagerTest.test_PlaceholderOnTopStackSwitch(PartOnTopManagerTest.java:175)

Corresponding test code:

assertTrue(isPartOnTop(part));
assertFalse(isPartOnTop(secondPart));

partStack.setSelectedElement(secondPart);

assertFalse(isPartOnTop(part)); <-------- assert here
assertTrue(isPartOnTop(secondPart));

I can imagine that we see the effect of "random" event dispatching order in org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(int, E)

// We can't guarantee any delivery order for synchronous events.
// Attempts to do so result in deadly embraces.

In debugger there are 5 elements in the queue.
Comment 20 Rolf Theunissen CLA 2019-11-20 10:04:22 EST
Finally getting somewhere:

In UIEventObjectSupplier#UIEventHandler#handleEvent sometimes 'unsubscribe' is called on the events registered for PartOnTopManager. After unsubscribe is called, the test fails (which makes sense).

The question now is why unsubscribe is called. This happens when the requestor becomes inValid, i.e., when a weak-reference looses its reference.

So, is there a strong reference to PartOnTopManager? E4Workbench provides the answer (line 135), and it is *no*.
Comment 21 Eclipse Genie CLA 2019-11-20 10:05:30 EST
New Gerrit change created: https://git.eclipse.org/r/153067
Comment 22 Andrey Loskutov CLA 2019-11-21 16:31:58 EST
I think the last patch from Rolf would be good candidate for RC1. WDYT?
Comment 23 Lars Vogel CLA 2019-11-21 16:47:59 EST
(In reply to Andrey Loskutov from comment #22)
> I think the last patch from Rolf would be good candidate for RC1. WDYT?

IIRC for test and example code no approval is required
Comment 24 Andrey Loskutov CLA 2019-11-21 16:48:51 EST
(In reply to Lars Vogel from comment #23)
> (In reply to Andrey Loskutov from comment #22)
> > I think the last patch from Rolf would be good candidate for RC1. WDYT?
> 
> IIRC for test and example code no approval is required

The change is in production code.
Comment 25 Lars Vogel CLA 2019-11-21 17:15:26 EST
(In reply to Andrey Loskutov from comment #24)
> (In reply to Lars Vogel from comment #23)
> > (In reply to Andrey Loskutov from comment #22)
> > > I think the last patch from Rolf would be good candidate for RC1. WDYT?
> > 
> > IIRC for test and example code no approval is required
> 
> The change is in production code.

Ah, didn't notice. Please find another lead to approve, I'm currently really busy with other work.
Comment 27 Lars Vogel CLA 2019-11-22 05:05:17 EST
Many thanks for Rolf for fixing that and to all which worked on this. This was a major annoyance in your Gerrit stream.