Bug 540222 - [tests] All test projects should dump stack traces on timeout
Summary: [tests] All test projects should dump stack traces on timeout
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.10   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 4.11 M3   Edit
Assignee: Simeon Andreev CLA
QA Contact:
URL:
Whiteboard:
Keywords: noteworthy
Depends on: 542876
Blocks:
  Show dependency tree
 
Reported: 2018-10-17 09:31 EDT by Simeon Andreev CLA
Modified: 2019-10-17 06:46 EDT (History)
9 users (show)

See Also:


Attachments
JUnit log from jdt.debug gerrit build 457. (213.15 KB, text/plain)
2018-10-17 09:31 EDT, Simeon Andreev CLA
no flags Details
Job log for jdt.debug gerrit build 457. (965.19 KB, text/plain)
2018-10-17 09:32 EDT, Simeon Andreev CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Simeon Andreev CLA 2018-10-17 09:31:32 EDT
Created attachment 276286 [details]
JUnit log from jdt.debug gerrit build 457.

A gerrit job for bug 540132 aborted due to a timeout, see build https://ci.eclipse.org/jdt/job/eclipse.jdt.debug-Gerrit/457/ and attached logs.

Neither the job log nor the Eclipse Error Log contain any stack traces. If e.g. a deadlock occurred, we have no way of knowing what exactly deadlocked. It would be great if all test projects (or as many as possible) can re-use code from the releng test runner org.eclipse.test.EclipseTestRunner, or at least do similar reporting of stack traces on timeout.

It would also be nice to know when exactly EclipseTestRunner is used and for which test projects. Is it used for gerrit jobs at all, or only for integration builds?
Comment 1 Simeon Andreev CLA 2018-10-17 09:32:47 EDT
Created attachment 276287 [details]
Job log for jdt.debug gerrit build 457.
Comment 2 Jonah Graham CLA 2018-10-17 12:20:56 EDT
I can't help wondering if JDT's HIPP instance is being affected by one of the other users of hipp6. CDT used to share with OpenJ9 and our build times went up dramatically. Some build steps having long freezes in unusual places,and lots of builds would timeout - I could never identify exact cause but since OpenJ9 and JDT started sharing a lot more build instabilities seem to happen and CDT has been very stable again.
Comment 3 Simeon Andreev CLA 2018-12-07 05:48:45 EST
Hi Alexander, hi Mickael,

I notice that e.g. JDT debug tests during integration builds run with the test applications in platform.releng (org.eclipse.test.uitestapplication). So the runner is EclipseTestRunner, which will report stack traces after a timeout occurs.

The gerrit jobs for JDT debug (and most other test projects I checked) run with tycho surefire (the application is org.eclipse.tycho.surefire.osgibooter.uitest). I do see the following timeout parameter, which I assume is used: https://www.eclipse.org/tycho/sitedocs/tycho-surefire/tycho-surefire-plugin/test-mojo.html#forkedProcessTimeoutInSeconds

However I find no information about stack trace dumps on timeout, such dumps are neither present nor do I find a tycho surefire parameter to enable them.

Any ideas here? It would be great if the tycho test runner can dump stack traces similarly to the releng test runner. A sporadic hang in gerrit jobs is otherwise very difficult to understand.

Best regards and thanks,
Simeon
Comment 4 Mickael Istria CLA 2018-12-07 06:37:45 EST
(In reply to Simeon Andreev from comment #3)
> I notice that e.g. JDT debug tests during integration builds run with the
> test applications in platform.releng (org.eclipse.test.uitestapplication).
> So the runner is EclipseTestRunner, which will report stack traces after a
> timeout occurs.

That could be something to improve in the EclipseTestRunner and Tycho could pass a flag to enable it if it's not done by default.

> However I find no information about stack trace dumps on timeout, such dumps
> are neither present nor do I find a tycho surefire parameter to enable them.
> Any ideas here? It would be great if the tycho test runner can dump stack
> traces similarly to the releng test runner. A sporadic hang in gerrit jobs
> is otherwise very difficult to understand.

I don't think there is anything providing this at the moment. That's a feature request for Tycho (that should probably require also this addition to EclipseTestRunner).
Comment 5 Simeon Andreev CLA 2018-12-07 06:42:09 EST
(In reply to Mickael Istria from comment #4)
> (In reply to Simeon Andreev from comment #3)
> > I notice that e.g. JDT debug tests during integration builds run with the
> > test applications in platform.releng (org.eclipse.test.uitestapplication).
> > So the runner is EclipseTestRunner, which will report stack traces after a
> > timeout occurs.
> 
> That could be something to improve in the EclipseTestRunner and Tycho could
> pass a flag to enable it if it's not done by default.

I.e. the EclipseTestRunner is running with the Tycho application? The integration builds are passing the parameter: -timeout 7200000

This enables the dump. Would it be possible to add this with a value that is a bit lower than the Tycho surefire timeout?
Comment 6 Mickael Istria CLA 2018-12-07 06:52:19 EST
(In reply to Simeon Andreev from comment #5)
> I.e. the EclipseTestRunner is running with the Tycho application? The
> integration builds are passing the parameter: -timeout 7200000

I'm not sure and cannot look at the code right now.
I suggest you simply add the -DdebugPort=8000 to those tests and connect debugger to the application and see what's involved in the execution and timeout.
Such an addition to dump stacks on timeout would be welcome in Tycho. Please open an enhancemnt request and consider submitting a patch (this part of Tycho isn't too hard to modify).
Comment 7 Eclipse Genie CLA 2018-12-10 09:39:24 EST
New Gerrit change created: https://git.eclipse.org/r/133787
Comment 8 Simeon Andreev CLA 2018-12-10 10:08:27 EST
I've debugged the test execution, unfortunately EclipseTestRunner is not used e.g. for jdt.debug.tests. So simply specifying -timeout will not be enough. I've created a change which copies the stack tracing dumping code from EclipseTestRunner to the Tycho test applications, see https://git.eclipse.org/r/#/c/133787/.
Comment 9 Eclipse Genie CLA 2018-12-11 04:37:23 EST
New Gerrit change created: https://git.eclipse.org/r/133831
Comment 10 Eclipse Genie CLA 2018-12-11 04:56:04 EST
New Gerrit change created: https://git.eclipse.org/r/133835
Comment 12 Eclipse Genie CLA 2018-12-13 09:40:37 EST
New Gerrit change created: https://git.eclipse.org/r/134000
Comment 14 Sravan Kumar Lakkimsetti CLA 2019-02-19 00:13:31 EST
Any thing pending on this bug? Can this bug be resolved?
Comment 15 Dani Megert CLA 2019-02-22 04:25:56 EST
Please reopen if more work needs to be done here.
Comment 16 Simeon Andreev CLA 2019-02-22 04:32:40 EST
Still waiting for Tycho changes: https://git.eclipse.org/r/#/c/133787/
Comment 17 Mickael Istria CLA 2019-02-22 04:37:51 EST
Tycho is a separate project from Platform. The current bug is attached to Platform project, the patch about Tycho should be tracked in another ticket.
Comment 18 Dani Megert CLA 2019-02-22 04:40:05 EST
The Tycho change should be moved to the Tycho bug.
Comment 19 Dani Megert CLA 2019-02-22 04:41:12 EST
Sorry, our changes just crossed.
Comment 20 Simeon Andreev CLA 2019-02-22 04:48:09 EST
Opened bug 544704.
Comment 21 Dani Megert CLA 2019-02-22 04:56:55 EST
(In reply to Simeon Andreev from comment #20)
> Opened bug 544704.
Isn't that already covered by bug 542876?
Comment 22 Simeon Andreev CLA 2019-02-22 04:59:00 EST
(In reply to Dani Megert from comment #21)
> (In reply to Simeon Andreev from comment #20)
> > Opened bug 544704.
> Isn't that already covered by bug 542876?

Sorry, didn't notice the link at all.