Bug 414653 - [test] intermittent failures in SwitchTest
Summary: [test] intermittent failures in SwitchTest
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.3   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 4.4 M5   Edit
Assignee: Jay Arthanareeswaran CLA
QA Contact:
URL:
Whiteboard:
Keywords: test
Depends on:
Blocks:
 
Reported: 2013-08-08 06:39 EDT by Stephan Herrmann CLA
Modified: 2015-01-05 17:02 EST (History)
3 users (show)

See Also:


Attachments
instrumentation (12.27 KB, patch)
2013-08-08 07:33 EDT, Stephan Herrmann CLA
no flags Details | Diff
Preventing System.exit calls shutting down the VM (1.62 KB, text/plain)
2013-11-22 14:04 EST, Timo Kinnunen CLA
no flags Details
Work around fix (2.81 KB, patch)
2014-01-09 00:18 EST, Jay Arthanareeswaran CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Stephan Herrmann CLA 2013-08-08 06:39:15 EDT
When I run RunJDTCoreTest locally, most the time I get one random failure in
SwitchTest, re-running that particular test always passes.

By instrumenting VerifyTests I see the following pattern:

- VM is launched for testMarysLamb()
- this test runs fine, socket is working
- the log file created for this VM reports only this one test.

- next VM is launched for testNestedSwitches()
- between testMarysLamb() and testNestedSwitches() we have two tests:
  - testBreakOut() -> pass
  - testMultipleSwitches() -> fail, unexpectedly empty output
  Why are those missing from the log?

In the console I find:

java.io.EOFException
	at java.io.DataInputStream.readBoolean(DataInputStream.java:244)
	at org.eclipse.jdt.core.tests.util.TestVerifier.loadAndRun(TestVerifier.java:583)
	at org.eclipse.jdt.core.tests.util.TestVerifier.verifyClassFiles(TestVerifier.java:643)
	at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2516)
	at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2317)
	at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runConformTest(AbstractRegressionTest.java:1361)
	at org.eclipse.jdt.core.tests.compiler.regression.SwitchTest.testBreakOut(SwitchTest.java:1859)

I see this for several tests incl. both "missing" tests, but *not* for testMarysLamb()

My guess that the socket was unexpectedly closed has not yet been confirmed by logging.
I may need to do more flushing to see if the missing log output is lost pending
in some buffer.
Comment 1 Stephan Herrmann CLA 2013-08-08 07:28:25 EDT
Next run indicates that the test prior to the failing one might have raised an
unexpected exception, instrumenting that branch, too.
Comment 2 Stephan Herrmann CLA 2013-08-08 07:33:38 EDT
Created attachment 234192 [details]
instrumentation

Although this code is quite crude, here's my current instrumentation.

One point is, how to feed context (test name) into the test verifier,
other point is, we can't use out or err because they're redirected for
communication with the test driver, so I'm using files in /tmp/ (hardcoded) instead.

This code is not meant to be released :)
Comment 3 Stephan Herrmann CLA 2013-08-08 08:05:48 EDT
Wow, the test VM just leaves the building without telling anybody:

This snippet:
	try {
		logFile.write(className+" "+socket.isClosed()+"\n");
		logFile.flush();
		loadAndRun(className);
		VerifyTests.this.logFile.write("\t \t-> "+socket.isClosed()+"\n");
		logFile.flush();
		out.writeBoolean(true);
		VerifyTests.this.logFile.write("\t true\t->"+socket.isClosed()+"\n");
		logFile.flush();
		System.err.println(VerifyTests.class.getName());
		System.out.println(VerifyTests.class.getName());
	} catch (Throwable e) {
		try {
			logFile.write("Exception: "+e);
			logFile.flush();
		...

Prints to the log file:

testSideEffect false
                -> false
         true   ->false
testFallThrough false
<EOF>

- testSideEffect shows the expected behavior, socket is not unexpectedly closed, OK.
- testFallThrough gets started and then nothing, no next statement, no exception

From a JUnit p.o.v. testFallThrough is marked pass, while the *next* one, 
testFallThrough2() fails, and: that one leaves no traces in any log file!
Perhaps, one side still believes it can talk to the test VM, while the other
side is already gone?

So, how can loadAndRun neither return nor throw?
- hang like in an infinite loop?
- crash the VM - and during crashing still keep the socket open?

Next I may want to try detecting any VM error log files in the working directory
before that is wiped for each new test. But that'll have to wait for another day
I'm afraid.

BTW: this was with the early-access J8 b92, I don't exactly recall if I've seen
this also in 1.7.
Comment 4 Stephan Herrmann CLA 2013-08-08 18:44:09 EDT
Next finding:
- when the EOF exception occurs, the test VM is no longer running, 
  its exit code however is 0 -> no indication of any crash.

After some more theory building regarding memory exhaustion or other accumulative
effects, I had a closer look at SwitchTest, because despite some randomness all
failures occurred in this suite. What I found: several occurrences of
   System.exit(0);
i.e. test cases force the test VM to terminate, whereas the test suite expects
to be able to re-use the VM for subsequent tests.

Now, turning all System.exit(0); into return; seems to fix the immediate issue.

One thing still slightly worries me: I also saw the EOF exception without any visible
test failures. This could indicate that tests signal green although the compiled test
classes may not have run to completion (or at all).

This implies that the mechanism to cope with termination of the test VM isn't fully
safe. Assuming, however, we'll remove all calls to System.exit() from the test suite
I'm not sure how relevant this latter observation is.


When applying these changes we need to observe the difference between System.exit(0) 
vs. return. Looking at testBreakOut() this one is OK:
   System.exit(0);
   break;
whereas this one complains about dead code:
   return;
   break;
Comment 5 Timo Kinnunen CLA 2013-11-22 14:04:25 EST
Created attachment 237650 [details]
Preventing System.exit calls shutting down the VM

Using a SecurityManager to prevent System.exit() being called didn't turn out as straight-forward as it originally seemed, therefore I attached the whole class instead.

One refinement might be to check the result code and on a 0 throw a ThreadDeath to end just the calling thread quietly. But that would encourage calling System.exit() more, so maybe not.
Comment 7 Stephan Herrmann CLA 2013-11-28 12:17:25 EST
(In reply to Timo Kinnunen from comment #5)
> Preventing System.exit calls shutting down the VM

Thanks for the patch.
Have you tried how the tests would cope with it?
Do they still get notified that the test has finished, even if exit() is blocked?
Comment 8 Timo Kinnunen CLA 2013-11-28 12:58:31 EST
I'm afraid to say I haven't, I'm not yet at the point of having the test suites up and running. 
I have tested that the AccessControlException gets thrown and it doesn't block native library loading like only setting a default SecurityManager in code does. 

The code is implemented to make System.exit() calls easy to identify for removal in my own projects, but this may not be how the test suites would prefer to deal with them. I suppose the tests that called System.exit() would show up as failed due to the AccessControlException, which sounds like what you'd want to happen.
Comment 10 Stephan Herrmann CLA 2013-11-29 14:41:59 EST
If anyone has some time to look into fixing this I'd suggest to start looking at org.eclipse.jdt.core.tests.util.TestVerifier.loadAndRun(String): there we catch a SocketException with this comment:

	// connection was reset because target program has exited

I'd assume that this is the location where we need to react to the VM exit. Interestingly both callers of that method ignore the method return value (and boolean return doesn't really suffice to report the 3 possible outcomes). Maybe s.t. as easy as setting reuseVM=false could signal that a new vm must be launched for the next test.
Comment 11 shankha banerjee CLA 2013-12-22 21:49:33 EST
Failed:
Build ID: N20131220-2000
Test Results: http://download.eclipse.org/eclipse/downloads/drops4/N20131220-2000/testresults/html/org.eclipse.jdt.core.tests.compiler_linux.gtk.x86_6.0.html

testSideEffect - 1.7

Unexpected output running resulting class file for testSideEffect.java: --[START]-- ---[END]---

junit.framework.AssertionFailedError: Unexpected output running resulting class file for testSideEffect.java:
--[START]--
---[END]---

at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2344)
at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2120)
at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runConformTest(AbstractRegressionTest.java:1191)
at org.eclipse.jdt.core.tests.compiler.regression.SwitchTest.testSideEffect(SwitchTest.java:1701)
at org.eclipse.jdt.core.tests.util.CompilerTestSetup.run(CompilerTestSetup.java:55)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:657)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:310)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:36)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:32)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethodWithException(EclipseAppContainer.java:587)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:198)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:109)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:80)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:372)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:226)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:636)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:591)
at org.eclipse.equinox.launcher.Main.run(Main.java:1450)
at org.eclipse.equinox.launcher.Main.main(Main.java:1426)
at org.eclipse.core.launcher.Main.main(Main.java:34)


Please also see: Bug 422220
Comment 12 shankha banerjee CLA 2013-12-26 00:40:36 EST
Failed again.
Build ID : I20131224-0800
Location: http://download.eclipse.org/eclipse/downloads/drops4/I20131224-0800/testresults/html/org.eclipse.jdt.core.tests.compiler_linux.gtk.x86_6.0.html

Platform: Linux
testMarysLamb - 1.7
Unexpected output running resulting class file for testMarysLamb.java: --[START]-- ---[END]---

junit.framework.AssertionFailedError: Unexpected output running resulting class file for testMarysLamb.java:
--[START]--
---[END]---

at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2344)
at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2120)
at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runConformTest(AbstractRegressionTest.java:1191)
at org.eclipse.jdt.core.tests.compiler.regression.SwitchTest.testMarysLamb(SwitchTest.java:1824)
at org.eclipse.jdt.core.tests.util.CompilerTestSetup.run(CompilerTestSetup.java:55)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:657)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:310)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:36)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:32)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethodWithException(EclipseAppContainer.java:587)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:198)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:109)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:80)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:372)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:226)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:636)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:591)
at org.eclipse.equinox.launcher.Main.run(Main.java:1450)
at org.eclipse.equinox.launcher.Main.main(Main.java:1426)
at org.eclipse.core.launcher.Main.main(Main.java:34)
Comment 13 Dani Megert CLA 2014-01-08 04:17:50 EST
And again in N20140107-2000:

Unexpected output running resulting class file for testMarysLamb.java: --[START]-- ---[END]---

junit.framework.AssertionFailedError: Unexpected output running resulting class file for testMarysLamb.java:
--[START]--
---[END]---

at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2344)
at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runTest(AbstractRegressionTest.java:2120)
at org.eclipse.jdt.core.tests.compiler.regression.AbstractRegressionTest.runConformTest(AbstractRegressionTest.java:1191)
at org.eclipse.jdt.core.tests.compiler.regression.SwitchTest.testMarysLamb(SwitchTest.java:1824)
at org.eclipse.jdt.core.tests.util.CompilerTestSetup.run(CompilerTestSetup.java:55)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:657)
at org.eclipse.test.EclipseTestRunner.run(EclipseTestRunner.java:310)
at org.eclipse.test.CoreTestApplication.runTests(CoreTestApplication.java:36)
at org.eclipse.test.CoreTestApplication.run(CoreTestApplication.java:32)
at org.eclipse.equinox.internal.app.EclipseAppContainer.callMethodWithException(EclipseAppContainer.java:587)
at org.eclipse.equinox.internal.app.EclipseAppHandle.run(EclipseAppHandle.java:198)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.runApplication(EclipseAppLauncher.java:109)
at org.eclipse.core.runtime.internal.adaptor.EclipseAppLauncher.start(EclipseAppLauncher.java:80)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:372)
at org.eclipse.core.runtime.adaptor.EclipseStarter.run(EclipseStarter.java:226)
at org.eclipse.equinox.launcher.Main.invokeFramework(Main.java:636)
at org.eclipse.equinox.launcher.Main.basicRun(Main.java:591)
at org.eclipse.equinox.launcher.Main.run(Main.java:1450)
at org.eclipse.equinox.launcher.Main.main(Main.java:1426)
at org.eclipse.core.launcher.Main.main(Main.java:34)
Comment 14 Dani Megert CLA 2014-01-08 04:18:24 EST
Please either fix or disable the test - it creates unnecessary noise when looking at the tests.
Comment 15 Jay Arthanareeswaran CLA 2014-01-09 00:01:53 EST
(In reply to Stephan Herrmann from comment #4)
> When applying these changes we need to observe the difference between
> System.exit(0) 
> vs. return. Looking at testBreakOut() this one is OK:
>    System.exit(0);
>    break;
> whereas this one complains about dead code:
>    return;
>    break;

Since none of us have time to investigate, we can go with this approach as a wrok around fix. As long as System.exit call are there for a specific purpose, we can simply replace all System.exit calls with "return;" or "break junk;" or anything similar that would be appropriate.
Comment 16 Jay Arthanareeswaran CLA 2014-01-09 00:18:30 EST
Created attachment 238801 [details]
Work around fix

(In reply to Jayaprakash Arthanareeswaran from comment #15)
> Since none of us have time to investigate, we can go with this approach as a
> wrok around fix.

A quick workaround fix.

Stephan, would you have time to try this out and see if the failures still occur in your environment?
Comment 17 Dani Megert CLA 2014-01-09 10:58:04 EST
Also failed in N20140108-2000.
Comment 18 Stephan Herrmann CLA 2014-01-09 11:08:24 EST
Doing the change in my working environment would only make sense by releasing this to BETA_JAVA8, but I believe we want to have it in master, right?
Comment 19 Jay Arthanareeswaran CLA 2014-01-10 06:26:39 EST
(In reply to Stephan Herrmann from comment #18)
> Doing the change in my working environment would only make sense by
> releasing this to BETA_JAVA8, but I believe we want to have it in master,
> right?

That's right. I have gone ahead and released this in master. We will see how the N builds go. I will keep this bug open till then.

http://git.eclipse.org/c/jdt/eclipse.jdt.core.git/commit/?id=653892de0ce6d92080f2ac329ebf5ae3bc092866
Comment 20 Jay Arthanareeswaran CLA 2014-01-13 22:50:45 EST
Last two N builds went well. Fix appears to be working. Thanks Stephan and Timo for your help.

Marking as resolved.
Comment 21 Dani Megert CLA 2014-01-14 05:57:40 EST
Also green in N20140113-2000.