381896 – Take a screenshot when a test times out

Bug 381896 - Take a screenshot when a test times out

Summary: Take a screenshot when a test times out

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Releng (show other bugs)
Version:	3.8
Hardware:	All All

Importance:	P2 enhancement (vote)
Target Milestone:	4.3
Assignee:	Markus Keller
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-06-06 13:50 EDT by Markus Keller
Modified:	2012-08-27 07:12 EDT (History)
CC List:	3 users (show)

See Also:

Attachments
Fix (untested) (6.23 KB, patch) 2012-06-06 13:50 EDT, Markus Keller	no flags	Details \| Diff
log start time (691 bytes, patch) 2012-08-20 13:25 EDT, Markus Keller	no flags	Details \| Diff
make startStackDumpTimoutTimer() work (4.95 KB, patch) 2012-08-21 08:35 EDT, Markus Keller	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Markus Keller

2012-06-06 13:50:26 EDT

Created attachment 216953 [details]
Fix (untested)

When a test times out, it would often be helpful to see the screen before the test is killed.

In the vein of bug 342053, here's a fix attempt. Again, this is completely untested, so please try it only after 4.2 is done.

Comment 1 David Williams

2012-08-16 19:47:11 EDT

I started to look at this patch, and for some reason, the parts that require SWT show up as errors .... cant' find import o.e.swt... I know it's "there", being reexported by o.e.ui ... but ... any guesses as to what I could be doing wrong? 
I've tried plenty of cleans and checking vms and rebuilds and restarts ... so was hoping for some "Eclipse 101" thing I'm overlooking? 

It all looks right ... guess I could release and assume it'll compile in batch mode ... but, feels pretty odd to commit/push code with "compile errors" in my IDE.

Comment 2 David Williams

2012-08-16 20:53:31 EDT

I remember now ... I had the swt repositories in my workspace (just having everything loaded up, even though, obviously, would contain errors since my once workspace doesn't support all platforms) so closed those so the version in the target is used and then all compiled fine. 

I've pushed to master, and will restart nightly build in order to get this "in code". 

I'm sure it won't take long before we see it "tested" in our unit tests :) 

From there, we'll look at output, and figure out what to do next ... there should be someway to "save it" with rest of unit test results.

Comment 3 Markus Keller

2012-08-17 05:56:47 EDT

(In reply to comment #1)
To make the org.eclipse.swt project compile work after checkout, you have to add the right .classpath file for your platform, see bug 387463.

To run that SWT code, you also need the binary bundle from the eclipse.platform.swt.binaries repo.

Comment 4 David Williams

2012-08-17 18:39:23 EDT

It seems there were some tests that timed out in last nightly, but I do not see any 'png' files in results. 

Searched for 

Timeout: killed the sub-process

in following files

found 2 occurrences
http://download.eclipse.org/eclipse/downloads/drops4/N20120816-2100/testresults/consolelogs/macosx.cocoa_consolelog.txt

found 1 occurrences
http://download.eclipse.org/eclipse/downloads/drops4/N20120816-2100/testresults/consolelogs/linux.gtk-6.0_consolelog.txt

found 0 occurrences
http://download.eclipse.org/eclipse/downloads/drops4/N20120816-2100/testresults/consolelogs/win7consolelog.txt

I think I'll set a real low timeout, like 10 minutes, so we get a whole lot of timeouts, and might help determine if we ever get an image (such as, maybe if on windows?)

Comment 5 David Williams

2012-08-19 22:01:05 EDT

I think I set timeout to 100 minutes, instead of 10 minutes. 6000000 ms instead of 600000 ms. 

I'm going to try and restart tonight's tests with the short limit. 

But, even at 100 minutes, I think there are a one or two png file among the HTML files ... but, I think do not have the type of names expected. For example, 

http://download.eclipse.org/eclipse/downloads/drops4/N20120819-0600/testresults/html/FindReplaceDialogTest.testFocusNotChangedWhenEnterPressed.png

Perhaps that one is ordinarily generated?

Comment 6 Markus Keller

2012-08-20 07:12:32 EDT

David, please revert all changes to runtests* files that you made for this bug.

The default timeout is controlled by the Ant property "timeout" that is set in library.xml's target "init":
47: <property name="timeout" value="7200000" />

This property is read in library.xml's target "java-test" in two places:

1. As argument in the <java timeout="${timeout}"> task. This one actually determines the time until the test Eclipse VM gets killed.

2. As command line argument to the process started by the <java> task:
<arg line="-timeout ${timeout}" />. This argument is eventually read by the EclipseTestRunner.java and is used in #startStackDumpTimoutTimer(..) there. That method starts a Timer that should expire shortly before the VM gets killed by Ant.

To shorten the timeout for all test.xml that just go with the default timeout, all you have to do is to decrease the timeout in library.xml to e.g. 10 min.:
<property name="timeout" value="600000" />

If the timeout mechanism works correctly, then the EclipseTestRunner will always print "EclipseTestRunner almost reached timeout ...", followed by a thread dump. After that, it will try to take a screenshot.

Note that this whole thing can't ever work when the VM crashes -- it's only to catch problem such as a dialog that blocks the UI thread indefinitely.


> http://download.eclipse.org/eclipse/downloads/drops4/N20120819-0600/
> testresults/html/FindReplaceDialogTest.testFocusNotChangedWhenEnterPressed.
> png

That's not from this bug. It's from bug 379026. I'm taking care of that and I'll stop generating this file when I'm done there.

Comment 7 David Williams

2012-08-20 09:38:42 EDT

(In reply to comment #6)
> David, please revert all changes to runtests* files that you made for this
> bug.
> 
> The default timeout is controlled by the Ant property "timeout" that is set
> in library.xml's target "init":
> 47: <property name="timeout" value="7200000" />
> 

I do not understand why the default in library.xml is a better place to change the 'timeout' value, rather than on the command line in 'runtests*'. 

Can you try again? (to explain to me). If it is a matter that the same variable is used for two things, essentially, then I'd suggest we change the name of one of them, since it is a common practice to override an properties on the command line. 

[But, I will be making the changes you suggest and trying another test run today]

Comment 8 Markus Keller

2012-08-20 10:33:54 EDT

(In reply to comment #7)
OK, I didn't know that setting a Java system property via -D will actually set an Ant property, so I suspected that this command line argument didn't actually work. But I still think it's less error-prone to change it in one location rather than in multiple files.

Anyway, it looks like the EclipseTestRunner's TimerTask is never reached, not even with 3-minute delay you set now. I can't explain this.

Comment 9 David Williams

2012-08-20 12:23:28 EDT

(In reply to comment #8)
> (In reply to comment #7)
> OK, I didn't know that setting a Java system property via -D will actually
> set an Ant property, so I suspected that this command line argument didn't
> actually work. But I still think it's less error-prone to change it in one
> location rather than in multiple files.
> 

I think it was working, once I set it to 10 minutes, instead of the 100 minutes I originally accidentally used, there were more "DNFs" (though not as many as I would have thought). The advantage of setting on command line, is then the bundle (whole build) does not have to be re-done. I can set on command line, and just rerun the tests on Hudson. Especially for something that is a "test run" (not a change that is desired to be persistent). 

> Anyway, it looks like the EclipseTestRunner's TimerTask is never reached,
> not even with 3-minute delay you set now. I can't explain this.

Yeah, I changed to 3 minutes fearing it just wasn't finishing in time. 
Guess the next step would be to add more "debugging" comments, or something. I may end up having to do more testing "locally" (was hoping a few quick tests on build.eclipse.org would suffice). 

After the current, 10:15 "N" build runs, I'll reset 'timeout' to normal value so tonight's N build is normal, in advance of tomorrows I-build. (But, I plan to leave the rest of the patch in place, unless you see some reason not too).

Comment 10 Markus Keller

2012-08-20 13:25:35 EDT

Created attachment 220062 [details]
log start time

OK, makes sense. Could you add this log statement to help explain why we don't even get the thread dump that we used to get before?

The VM crashes could explain the absence on GTK, but on Windows, the "almost reached timeout" stuff should really be there.

Comment 11 David Williams

2012-08-20 14:49:02 EDT

I will add the patch for tonight's Nightly. (and tomorrow I build). 

I did notice one thing odd. 

We have two ways to launch the tests "java-test" and "eclipse-test" targets. One using the launcher with Java exe, and the other using the eclipse.exe directly. 

The java-test version has 

<arg line="-timeout ${timeout}" />

but the eclipse-test version does not. 

I can find no evidence we ever use the eclipse-test target, but have added some extra <echo s there, to help make sure. 

 <echo message="DEBUG: Running tests using eclipse-test" /> vs
 <echo message="DEBUG: Running tests using java-test" />

We should see only the latter in logs. 

I suspect the -timeout should be in the eclipse exe target too ... but do not think that's cause our current observations (well, lack of observation :\

Comment 12 David Williams

2012-08-20 15:05:24 EDT

Actually, I can tell from current logs we use only "java-test:" as in 

java-test:
     [echo] Running org.eclipse.jdt.core.tests.compiler.parser.TestAll. Result file: /opt/users/hudsonbuild/workspace/eclipse-JUnit-Linux2/workarea/N20120819-2000/eclipse-testing/results/linux.gtk.x86_6.0/org.eclipse.jdt.core.tests.compiler.parser.TestAll.xml.
     [echo] timout property: 600000.

So, I've opened bug 387638 to track fixing eclipse-test someday ... once we get java-test working :/

Comment 13 Markus Keller

2012-08-21 07:09:21 EDT

In N20120820-2000, it's clear that EclipseTestRunner#startStackDumpTimoutTimer(String, File) is never called.

EclipseTestRunner has two modes of operation (one vs. multiple tests), and my patch picked the wrong one (the one that is never used).

Comment 14 Markus Keller

2012-08-21 08:35:32 EDT

Created attachment 220097 [details]
make startStackDumpTimoutTimer() work

This should do it. I also tried to put the screenshots into the "results" directory.

Comment 15 David Williams

2012-08-21 13:56:56 EDT

I've applied the patch and can schedule a quick N build once the I build tests finish. (I might set time limit to 5 minutes this time, since 10 minutes did not make much different :/ 

About the "results" directory. What are you trying to accomplish? 
Wouldn't we want them in their own directory, under results? 

Plus, our "results" directory is, I think, defined in library.xml 

<property name="junit-report-output" value="${eclipse-home}/results" />

so we'd be better off getting that property, and then creating new directory, perhaps "screendumps"? under that, right? Or, am I missing the point of what you are trying to do? 

FWIW, it's easy to add new directories to log.php pages just add 
"listLogs(directoryname)" (and, in this case, add ".png" to to acceptable file types to list).

Comment 16 Markus Keller

2012-08-22 07:19:10 EDT

At the moment, I'm just trying to make screenshots and save them somewhere where I think they could be found later.

Unfortunately, my guesses about the environment failed, and the screenshots ended here, where they were lost after the test run:
Screenshot saved to: /Users/hudsonbuild/workspace/eclipse-JUnit-mac2/workarea/N20120821-2000/eclipse-testing/test-eclipse/eclipse/resultsscreen1.png

TODOs:
- create paths like this: new org.eclipse.core.runtime.Path(dir.getAbsolutePath()).append(xxx).toOSString()
- make sure screenshots are uniquly identifiable (test suite, platform). This may require additional arguments to be passed to EclipseTestRunner.
- list screenshots along with logs

David, do you want to do this, or shall I continue?

Comment 17 David Williams

2012-08-22 08:47:52 EDT

(In reply to comment #16)
> At the moment, I'm just trying to make screenshots and save them somewhere
> where I think they could be found later.
> 
> Unfortunately, my guesses about the environment failed, and the screenshots
> ended here, where they were lost after the test run:
> Screenshot saved to:
> /Users/hudsonbuild/workspace/eclipse-JUnit-mac2/workarea/N20120821-2000/
> eclipse-testing/test-eclipse/eclipse/resultsscreen1.png
> 
> TODOs:
> - create paths like this: new
> org.eclipse.core.runtime.Path(dir.getAbsolutePath()).append(xxx).toOSString()
> - make sure screenshots are uniquly identifiable (test suite, platform).
> This may require additional arguments to be passed to EclipseTestRunner.
> - list screenshots along with logs
> 
> David, do you want to do this, or shall I continue?

Yep, I can fix up the paths and experiment with best place to save them. 
Will likely wait till after today's M builds, but don't think the current code will hurt those  ... we'll get this done soon. 

Thanks for this innovation!

Comment 18 David Williams

2012-08-24 14:01:42 EDT

I've almost got this to the point of committing, but one question. 

I did have to add a command line parameter, junitReportOutput, to get the best "output directory", and while doing that I noticed (well, to be honest, I was confused for 2 or 3 hours, why my changes were not working) that for most cases, the parameters are lowercased before comparison. For example, 

...
else if (args[i].toLowerCase().equals("-testpluginsnames")) {
                if (i < args.length - 1) 
                    testPluginsNames = args[i + 1];
                i++;
...

Which happened to be the one I copy/pasted and, being dense, could not figure out why this was not working:
...
else if (args[i].toLowerCase().equals("-junitReportOutput")) {
                if (i < args.length - 1) {
                    junitReportOutput = args[i + 1];
                }
              i++;
...

So, granted, I was dense in this case, but think it was partially because I always treat values/variables as if case matters. 

Is there any particular reason we accept any case in test arguments? Ant convention or something? Or, just originators personal preferences? 

I notice we are are not consistent, exactly, since we have 
...
else if (args[i].equals("-timeout")) {
                if (i < args.length - 1) {
                    timeoutString = args[i + 1];
                }
....

I can open separate bug(s) if there's any "cleanup" to do here, but just thought I'd ask before my initial commit. (I'd prefer to go with plain "equals", myself, and expect case to be correct ... but didn't want to violate some convention or expectation, if there are any). 

Advise welcome. 

Thanks,

Comment 19 Markus Keller

2012-08-24 14:16:16 EDT

I don't think there's much thought behind the usage of toLowerCase(). The initial commit in 2002 was already inconsistent. I agree that plain equals(..) is a better choice.

If you want to remove the existing calls, then you need to update all callers (at least library.xml, maybe more), the Javadoc of the main method, and testframework.html.

Comment 20 David Williams

2012-08-24 14:36:54 EDT

(In reply to comment #19)
> I don't think there's much thought behind the usage of toLowerCase(). The
> initial commit in 2002 was already inconsistent. I agree that plain
> equals(..) is a better choice.
> 
> If you want to remove the existing calls, then you need to update all
> callers (at least library.xml, maybe more), the Javadoc of the main method,
> and testframework.html.

Ok, thanks, I'll open separate bug. 

Now, you've been so helpful (this late on a Friday!) ... the project has no project specific "formating style". Is there any convenient rules I could "import"? Or, are the "Eclipse built-in" rules close enough? I added some comments, made some constants, and when formated turned all the tabs into spaces (my normal preference) ... so, just wondering if there's a clear "standard" used by the Platform? Or JDT? I've seen the wiki pages that talk about "coding conventions" ... but didn't see mention of a handy "cleanup profile" I could import? If there's not, that's fine, I'll just undo what I can so the patched code will be more similar. to original. Thanks.

Comment 21 David Williams

2012-08-24 16:23:41 EDT

I've opened bug 388029 to discuss/track various cleanup issues (such as parameter casing). 

And, have started (at 4 PM eastern) a test build with 5 minute timeouts. 

One change I made, and tell me if you find this more confusing that helpful. 

Previously the "dumps" were called with "1" and "2" (first, second), but since this is just a label, I decided to call it with "0" and "number of seconds between dumps" (currently "5") so that would provide a subtle reminder of how many seconds elapsed. Some day we might change it to 10? or 60? The stack traces themselves also print out this "number label", but they also print the exact time, so a little easier to see the time difference there. 

If you think that's more confusing than helpful, that's fine, I'll change it back to "1" and "2". 

Also, in the filename, I also, currently, use the "classname" that is passed into the test framework, which is basically the class that is running the tests, or tests suites, not literally THE test that is hung. I'm not sure how to get that, without doing some complicated "walk" of the stack (which, I would not want to do :) but if you know of some easy way, let me know. Maybe obvious from a human reading stack trace and looking at screen dumps ... and guess a suite can only timeout once (per run) so may not on a test-by-test basis. 

The files to into a directory named 'timeoutScreens' under the machine specific "results" directory, a peer of "crashReports", so there could be files, say, in 
.../results/linux.gtk.x86_64/timeoutScreens

I've updated the logs.php so that if that timeoutScreens directory exists, to list the files there. (forgot to add ".png" as allowable in filter, so will do that by hand for current test run).

Comment 22 David Williams

2012-08-24 23:59:06 EDT

Appears to be working, starting with

http://download.eclipse.org/eclipse/downloads/drops4/N20120824-1600/logs.php#console

(and, this, recall, with the very short time limit set, to encourage lots of cases).

One problem, on some of the linux ones, the "screen" is just black. Judging from the names of the tests, they may not involved UI so there is likely no "Display". Such as

http://download.eclipse.org/eclipse/downloads/drops4/N20120824-1600/testresults/linux.gtk.x86_6.0/timeoutScreens/org.eclipse.core.tests.resources.AutomatedTests_screen0.png

I wonder if we should call

Display.getCurrent();

instead of

Display.getDefault();

and if getCurrent() returns null, then we'd just log the fact there was no current display and not try to snag a screen shot. Though, guess it does no harm to have "blank" screen shots :)

The larger problem is on Windows. There, all the screen captures were "blank" (sort, of grey). I've no idea. From the names, they might all be "core" tests, except for one, pde.ui, sounds like it ought to have some UI. Perhaps its just the nature of the short timeout, and it had not gotten to the point of having a display yet?

http://download.eclipse.org/eclipse/downloads/drops4/N20120824-1600/testresults/win32.win32.x86_7.0/timeoutScreens/org.eclipse.pde.ui.tests.AllPDETests_screen0.png

I will change to use "getCurrent()" and at least get some sample runs with that over the weekend ... but, can always change it back if anyone think it best to create a Display, if it does not already exist.

Comment 23 David Williams

2012-08-25 00:26:58 EDT

 
> I will change to use "getCurrent()" and at least get some sample runs with
> that over the weekend ... but, can always change it back if anyone think it
> best to create a Display, if it does not already exist.

Changed my mind already. I do not know enough (i.e. do not know anything) about thread and unit tests so no idea if we are in a UI thread or not. Decided instead to log a little extra info, and just product blank screen shots if there really is no Display at all (until we create it). 


private static Display getDisplay() {
      Display display = Display.getCurrent();
      if (display == null) {
    	  System.err.println("Display.getCurrent() returned null so this is either a core test with no UI, or this test is running in a non-UI thread, so we will use Display.getDefault() to be safe.");
          display = Display.getDefault();
       }
      return display;		
}

Comment 24 David Williams

2012-08-26 03:29:15 EDT

I think this is working as well as it needs to. 

You can see "samples" from a run where I set the time limit to 5 minutes. 
http://download.eclipse.org/eclipse/downloads/drops4/N20120825-1030/logs.php

The windows ones did not turn out well, there, but I think there were few "UI tests" that ran to the point of involving the UI. Not sure why. Not sure how to force a test on windows? 

But the mac and linux ones look as expected.

Comment 25 Markus Keller

2012-08-27 07:12:43 EDT

Re formatting style: There's no consistent style across all Eclipse SDK projects. Projects that don't have project-specific styles yet often take the "Eclipse [built-in]" but increase the line length to at least 120 (bug 356851).

Of course, I like the org.eclipse.jdt.ui styles the best.

A very bad habit is to format a whole file along with making other changes. This makes the real changes hard to see. If you feel like you must format the whole file, then at least do this in a separate commit. But unless code is in a really horrible shape, it's usually best to leave it as is. There are things that an automatic formatter can never understand (meaningful line breaks, line comments, cases where a good developer deviates from the "standard" because it makes the code better readable in context).

Note that there's also a Save Actions preference page, where you can choose to automatically format edited lines only. Some devs like that (I don't).