Bug 423561 - DNFs and VM crash in mozilla.XPCOM._VtblCall in Browser2.test1 on xulrunner-10.0.4esr
Summary: DNFs and VM crash in mozilla.XPCOM._VtblCall in Browser2.test1 on xulrunner-1...
Status: CLOSED WONTFIX
Alias: None
Product: Platform
Classification: Eclipse Project
Component: SWT (show other bugs)
Version: 4.4   Edit
Hardware: PC Linux-GTK
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Platform-SWT-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: test
Depends on: 210792
Blocks:
  Show dependency tree
 
Reported: 2013-12-09 04:01 EST by Dani Megert CLA
Modified: 2017-07-04 13:25 EDT (History)
11 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2013-12-09 04:01:05 EST
SWT tests consistently DNF since Friday.

The DNF happens because of a crash:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f666c9c0e2e, pid=1108, tid=140078465447680
#
# JRE version: 7.0_25-b15
# Java VM: Java HotSpot(TM) 64-Bit Server VM (23.25-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libnssutil3.so+0xee2e]  NSSRWLock_LockWrite_Util+0xe
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /opt/users/hudsonbuild/workspace/ep4-unit-lin64/workarea/N20131206-2000/eclipse-testing/test-eclipse/eclipse/hs_err_pid1108.log
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.sun.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Comment 1 Dani Megert CLA 2013-12-09 05:53:43 EST
Reducing severity: the build seems usable, since the other tests passed.
Comment 2 Markus Keller CLA 2013-12-09 06:41:34 EST
The VM crash only occurred in Friday's N20131206-2000 build.

In N20131207-1500 and I20131208-2000, the SWT Tests did not crash, but were killed by the Ant task before any results were produced (test log is empty):

    [java] Timeout: killed the sub-process

The timeouts look like a separate problem. This already happened a few times before in older N-builds.

Let's use this bug for the VM crash. http://download.eclipse.org/eclipse/downloads/drops4/N20131206-2000/testresults/linux.gtk.x86_6.0/crashlogs/org.eclipse.swt.tests.junit.AllGtkTests.hs_err_pid1108.log continues with:

Stack: [0x00007f668f1c7000,0x00007f668f2c8000],  sp=0x00007f668f2c4450,  free space=1013k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libnssutil3.so+0xee2e]  NSSRWLock_LockWrite_Util+0xe

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.eclipse.swt.internal.mozilla.XPCOM._VtblCall(IJJ[B[C)I+0
j  org.eclipse.swt.internal.mozilla.XPCOM.VtblCall(IJJ[B[C)I+14
j  org.eclipse.swt.internal.mozilla.nsIObserverService.NotifyObservers(J[B[C)I+13
j  org.eclipse.swt.browser.Mozilla.initProfile(Lorg/eclipse/swt/internal/mozilla/nsIServiceManager;Z)V+163
j  org.eclipse.swt.browser.Mozilla.create(Lorg/eclipse/swt/widgets/Composite;I)V+429
j  org.eclipse.swt.browser.Browser.<init>(Lorg/eclipse/swt/widgets/Composite;I)V+81
j  org.eclipse.swt.tests.junit.browser.Browser2.test1(Ljava/lang/String;)Z+83
j  org.eclipse.swt.tests.junit.browser.Browser2.test()Z+21
j  org.eclipse.swt.tests.junit.browser.Test_BrowserSuite.Browser2()V+14
j  org.eclipse.swt.tests.junit.browser.Test_BrowserSuite.runTest()V+41
...
Comment 3 Dani Megert CLA 2013-12-09 17:26:55 EST
Also DNFed in I20131209-0800.

Whatever the reason is, it need to be fixed for M4.

See also bug 400527 and bug 414293.
Comment 4 Arun Thondapu CLA 2013-12-10 09:19:12 EST
I'm not sure why the VM crash occurred in Browser2 but I figured while running the tests on my Linux setup that the browser suite hangs when it is run without a network connection. This could possibly explain the inconsistent nature of the DNFs. The hang seems to be happening while running the tests in Browser3. I tried disabling this test and found that the test suite doesn't hang anymore even without a network connection. I have temporarily disabled this test for now to confirm if that is the case with the I-builds too - http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=6a8e6a9f6f6100d8069f881a37585d1aef908728

Also, this is probably a known issue with some other Browser tests too which have been disabled similarly on Linux. Will try to figure out why they have been disabled.

David, just out of curiosity, are you aware of any recent internet connectivity issues with the Linux test box?
Comment 5 Markus Keller CLA 2013-12-10 12:43:26 EST
(In reply to Arun Thondapu from comment #4)
> Also, this is probably a known issue with some other Browser tests too which
> have been disabled similarly on Linux. Will try to figure out why they have
> been disabled.

See bug 420365 (probably nobody knows that any more).

Other recent network-related problems in tests I remember were on the Mac:
Bug 420365, bug 420260.
Comment 6 Markus Keller CLA 2013-12-10 12:44:36 EST
First should have been bug 420258.
Comment 7 Arun Thondapu CLA 2013-12-11 05:06:53 EST
(In reply to Arun Thondapu from comment #4)
> The hang seems to be happening while
> running the tests in Browser3. I tried disabling this test and found that
> the test suite doesn't hang anymore even without a network connection. I
> have temporarily disabled this test for now to confirm if that is the case
> with the I-builds too -

Disabling this test doesn't seem to have helped with the Linux tests. It still DNFed in I20131210-8000 and I20131210-2000 (Linux tests for all components have DNFed in this build, not sure if there is some infrastructure issue).

Dani/Markus, are there any logs that can help me figure out exactly which test is hanging or failing to finish on Linux? I can disable all the Browser tests and try if it helps. In my local tests, the Browser uses Webkit but on the actual test machine, it is using Mozilla Xulrunner, which may exhibit different behavior.
Comment 8 David Williams CLA 2013-12-11 09:21:34 EST
(In reply to Arun Thondapu from comment #4)

> 
> David, just out of curiosity, are you aware of any recent internet
> connectivity issues with the Linux test box?

No, not that I know of. 

(In reply to Arun Thondapu from comment #7)
> (In reply to Arun Thondapu from comment #4)
> ... are there any logs that can help me figure out exactly which
> test is hanging or failing to finish on Linux? I can disable all the Browser
> tests and try if it helps. In my local tests, the Browser uses Webkit but on
> the actual test machine, it is using Mozilla Xulrunner, which may exhibit
> different behavior.

No, no logs that I know of. would help. In theory your tests could write 
additional information to "console log" and it should be captured ... but, 
not sure that's feasible? 

Just FYI, occasionally some insight can be gained by browsiing the Hudson
workspace, which you can get to from 
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-lin64/
and drilling down into "workspace" folder ... 
But I didn't see anything in this case. 

(And, it is wiped clean before each test run, which will happen soon for 
this morning's 8 AM run).
Comment 9 David Williams CLA 2013-12-11 09:33:48 EST
(In reply to David Williams from comment #8)

I have, just now, switched that test to run on "slave 2", instead of where it has been running, "slave 4". 

If that makes a difference, it might mean that there is a process "hung" on slave 4 (such as a unit test "left over" from some previous run?). 

While there are ways I can get a list of processes running, etc., it's not as simple as opening a shell on it, so I'm not sure there would be time before next test cycle starts. Also a few of us have the ability to "restart" the slave (not quite a reboot, but would free up some hung processes).
Comment 10 Markus Keller CLA 2013-12-11 11:37:46 EST
It's hard to get more information about the recent DNFs (not the VM crash from comment 0).

In http://download.eclipse.org/eclipse/downloads/drops4/I20131210-2000/testresults/consolelogs/linux.gtk.x86_64_6.0_consolelog.txt , when you search for "swt:" and then skip all the [echoproperties] lines, the actual test output to syserr is this:

java-test:
     [echo] Running org.eclipse.swt.tests.junit.AllGtkTests. Result file: /opt/users/hudsonbuild/workspace/ep4-unit-lin64/workarea/I20131210-2000/eclipse-testing/results/linux.gtk.x86_6.0/org.eclipse.swt.tests.junit.AllGtkTests.xml
     [echo] timout property: 7200000
[..]
     [java] starting EclipseTestRunnerTimer with timeout=7080000 at 2013-12-10 22:53:34 -0500
     [java] Xlib:  extension "RANDR" missing on display ":100.0".
     [java] Timeout: killed the sub-process


The "Timeout: killed the sub-process" is coming from the Ant <java> task. The org.apache.tools.ant.taskdefs.ExecuteWatchdog#timeoutOccured(Watchdog) implementation just calls java.lang.Process#destroy(), which immediately terminates the process without generating a stacktrace.

org.eclipse.test.EclipseTestRunner#startStackDumpTimoutTimer(..) should print stacktraces 2 minutes before the process gets killed, but the java process must be in such a bad state that it doesn't even execute a scheduled java.util.Timer any more. The first part of dump(..) doesn't refer to a Display, so even if the UI thread is blocked, it should print "EclipseTestRunner almost reached timeout" , followed by a full thread dump.

Since we cannot get more information from within the blocked JVM, my last idea would be to write an Ant task that uses jps and jstack to generate stacktraces from all running JVMs on that machine (a few minutes before the test gets killed).

(In reply to David Williams from comment #8)
> No, no logs that I know of. would help. In theory your tests could write 
> additional information to "console log" and it should be captured ... but, 
> not sure that's feasible? 

I've added that: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=4eb7f8cf3e79735d171bf8ad6bbe5175223935a7

It writes to System.out, so the next linux.gtk.x86_6.0_org.eclipse.swt.tests.junit.AllGtkTests.txt should reveal whether any tests did actually run.
Comment 11 David Williams CLA 2013-12-12 14:05:56 EST
> I've added that:
> http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/
> ?id=4eb7f8cf3e79735d171bf8ad6bbe5175223935a7
> 
> It writes to System.out, so the next
> linux.gtk.x86_6.0_org.eclipse.swt.tests.junit.AllGtkTests.txt should reveal
> whether any tests did actually run.

Did this work as expected? 

I don't see anything in 
http://download.eclipse.org/eclipse/downloads/drops4/I20131211-2000/testresults/linux.gtk.x86_6.0/org.eclipse.swt.tests.junit.AllGtkTests.txt

Am I looking in wrong file? 

At any rate, now that M4 is (nearly) done, I've switched back to "slave 4" in case it is specific to that machine, might help collect the needed data to isolate the issue ... if it is related to the machine (or, state that it is in). 

The next two N-builds (Friday night and Saturday) should run in similar "environment", but Hudson is restarted every week (early Sunday AM) so by Sunday's N-build, the machine/process will have been "reset".
Comment 12 Markus Keller CLA 2013-12-12 15:19:33 EST
> Did this work as expected? 

No, SWT didn't do a build input, so this commit was not picked up. But it will be in the rebuild.

All browser tests are now disabled on Linux: http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=699ebc90cf45d54ea68214c55b73fe719a348c8a
If the DNF was due to the browser tests, then everything should be green now.

Moving this bug to M5 to enable the tests again and investigate the cause.
Comment 13 Arun Thondapu CLA 2013-12-13 09:31:47 EST
(In reply to David Williams from comment #11)
> At any rate, now that M4 is (nearly) done, I've switched back to "slave 4"
> in case it is specific to that machine, might help collect the needed data
> to isolate the issue ... if it is related to the machine (or, state that it
> is in). 
> 
> The next two N-builds (Friday night and Saturday) should run in similar
> "environment", but Hudson is restarted every week (early Sunday AM) so by
> Sunday's N-build, the machine/process will have been "reset".

We'll leave the Browser tests disabled for today's nightly build to check whether the DNF still occurs on "slave 4" (if it is a machine issue, the DNF might happen in spite of the disabled tests). We'll enable the tests again on Monday.
Comment 14 Arun Thondapu CLA 2013-12-16 09:44:31 EST
(In reply to Arun Thondapu from comment #13)
> We'll enable the tests again on Monday.

I've re-enabled the Browser tests that were disabled last week to avoid the test DNFs during M4. Will monitor the test results of tonight's nightly build to confirm whether the tests finish running on Linux.
Comment 15 Markus Keller CLA 2013-12-17 06:45:16 EST
OK, https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-lin64/ws/workarea/N20131216-2000/eclipse-testing/results/linux.gtk.x86_6.0/org.eclipse.swt.tests.junit.AllGtkTests.txt confirms that the timeout also happens in Browser2 (the first Browser* test that is actually executed).

XULRunner seems to be broken with GTK3. Since I don't see a way to globally force XULRunner, I modified Browser2 to use "new Browser(shell, SWT.MOZILLA);" and then launched with
-Dorg.eclipse.swt.browser.XULRunnerPath=</path/to>/xulrunner-10.0.2

This worked fine when SWT_GTK3 was set to 0, but crashed when GTK3 was enabled.
Comment 16 Arun Thondapu CLA 2013-12-17 13:00:46 EST
(In reply to Markus Keller from comment #15)
> OK,
> https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-
> lin64/ws/workarea/N20131216-2000/eclipse-testing/results/linux.gtk.x86_6.0/
> org.eclipse.swt.tests.junit.AllGtkTests.txt confirms that the timeout also
> happens in Browser2 (the first Browser* test that is actually executed).
> 
> XULRunner seems to be broken with GTK3. Since I don't see a way to globally
> force XULRunner, I modified Browser2 to use "new Browser(shell,
> SWT.MOZILLA);" and then launched with
> -Dorg.eclipse.swt.browser.XULRunnerPath=</path/to>/xulrunner-10.0.2
> 
> This worked fine when SWT_GTK3 was set to 0, but crashed when GTK3 was
> enabled.

I noticed this recently Markus and will need to work on fixing it. However, I'm not sure this is responsible for the DNFs in any way, I don't think the Linux test machine has GTK 3 installed but we'll need David to confirm that. David, can you please share details about the configuration of the Linux test machine (distro version, GTK 2 & 3 installed versions, versions of XULRunner and Webkit if installed) ? Thanks in advance.
Comment 17 David Williams CLA 2013-12-17 13:33:58 EST
(In reply to Arun Thondapu from comment #16)
> (In reply to Markus Keller from comment #15)
> > OK,
> > https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-
> > lin64/ws/workarea/N20131216-2000/eclipse-testing/results/linux.gtk.x86_6.0/
> > org.eclipse.swt.tests.junit.AllGtkTests.txt confirms that the timeout also
> > happens in Browser2 (the first Browser* test that is actually executed).
> > 
> > XULRunner seems to be broken with GTK3. Since I don't see a way to globally
> > force XULRunner, I modified Browser2 to use "new Browser(shell,
> > SWT.MOZILLA);" and then launched with
> > -Dorg.eclipse.swt.browser.XULRunnerPath=</path/to>/xulrunner-10.0.2
> > 
> > This worked fine when SWT_GTK3 was set to 0, but crashed when GTK3 was
> > enabled.
> 
> I noticed this recently Markus and will need to work on fixing it. However,
> I'm not sure this is responsible for the DNFs in any way, I don't think the
> Linux test machine has GTK 3 installed but we'll need David to confirm that.
> David, can you please share details about the configuration of the Linux
> test machine (distro version, GTK 2 & 3 installed versions, versions of
> XULRunner and Webkit if installed) ? Thanks in advance.

I'm not sure GTK 3 is installed on either, but, I've sent Arun complete list of installed packages via email. (I didn't want to attach/publish them here publically, since, I've heard, if hackers "know what's installed" it gives them a little head start in trying to take advantage of vulnerabilities). And, plus ... I didn't know exactly what to query on, so thought I'd list everything. 

I forgot to query "distro" exactly (missed that in your question), but pretty sure both are "Suse 11" ... let me know if you need exact data from uname -a. 

HTH
Comment 18 Dani Megert CLA 2013-12-19 04:06:57 EST
I don't think it's GTK3 related: the DNF now also happened on the M-build. The test is killed by a timeout and the following is on the screen:
http://download.eclipse.org/eclipse/downloads/drops4/M20131218-0800/testresults/linux.gtk.x86_6.0/timeoutScreens/org.eclipse.swt.tests.junit.AllGtkTests_screen0.png
Comment 19 Arun Thondapu CLA 2013-12-24 10:23:07 EST
(In reply to Markus Keller from comment #15)
> OK,
> https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep4-unit-
> lin64/ws/workarea/N20131216-2000/eclipse-testing/results/linux.gtk.x86_6.0/
> org.eclipse.swt.tests.junit.AllGtkTests.txt confirms that the timeout also
> happens in Browser2 (the first Browser* test that is actually executed).

For the last nightly build N20131223-2000, I disabled Browser2 and left the other Browser tests enabled. The tests finished running on Linux and did not DNF this time, which means the problem seems to be only with the Browser2 test and not the other Browser tests that are currently enabled.

> 
> XULRunner seems to be broken with GTK3.

This is bug 423870 and I'm working on fixing it.


Regarding the Browser2 test itself, as David and Dani mentioned, it is definitely not a GTK3 problem as both the Linux test machines do not have GTK3 installed. The only significant difference I found from the list of packages sent by David was that Webkit seems to be installed on hudson-slave2 but not on hudson-slave4. This means that the Browser tests run with Webkit on slave 2 (since that is the default on Linux) and with XULRunner on slave 4. I'll need to continue to investigate why Browser2 test was hanging on either or both of these machines.
Comment 20 Dani Megert CLA 2014-01-08 04:21:01 EST
Again DNFed in N20140107-2000.

Last output to console:
SwtTestCase#setUp(): org.eclipse.swt.tests.junit.browser.Test_BrowserSuite#Browser1
Comment 21 Arun Thondapu CLA 2014-01-13 06:19:06 EST
(In reply to Dani Megert from comment #20)
> Again DNFed in N20140107-2000.
> 
> Last output to console:
> SwtTestCase#setUp():
> org.eclipse.swt.tests.junit.browser.Test_BrowserSuite#Browser1

I'm suspecting this to be a problem in the test machine's environment but I'm not sure how to verify that. The tests do not hang with XULRunner as well as Webkit in my local workspace. Also, the Browser1 test is already disabled on Linux, it doesn't do anything at all.
Comment 22 Markus Keller CLA 2014-01-13 08:38:50 EST
Alex removed the logging from SwtTestCase#setUp() with
http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=5e8237fad62914f64e36b4aa815dc4b554b71053
with comment "It was doing nothing but doing exessive logging on setUp ...".

Alex: Before removing code you don't understand, you have to use "Show Annotations" on the file and at least look at the bug that contains infos about the most recent change to the affected lines. In this case, the logging was necessary to track down DNFs in the tests.

I think it makes sense to keep the logging until this bug is fixed.
Comment 23 Alexander Kurtakov CLA 2014-01-13 08:57:31 EST
(In reply to Markus Keller from comment #22)
> Alex removed the logging from SwtTestCase#setUp() with
> http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/
> ?id=5e8237fad62914f64e36b4aa815dc4b554b71053
> with comment "It was doing nothing but doing exessive logging on setUp ...".
> 
> Alex: Before removing code you don't understand, you have to use "Show
> Annotations" on the file and at least look at the bug that contains infos
> about the most recent change to the affected lines. In this case, the
> logging was necessary to track down DNFs in the tests.
> 
> I think it makes sense to keep the logging until this bug is fixed.

Sorry for that. SWT tests are such a jungle lately that I'm doing my best to get them cleaned to a state where it's obvious what's going on. 
Also Friday and Saturday failures should differ based on tries we made for fixing bug #419527. 
Also the change you mentioned is from 2014-01-10 which is past the tests starting to DNF and I fail to find why is the logging needed, would you please explain?
Comment 24 Alexander Kurtakov CLA 2014-01-13 09:01:02 EST
If there are potential problem and hidden rocks I would really appreciate people knowing about them to reply to mails like http://dev.eclipse.org/mhonarc/lists/platform-swt-dev/msg07568.html .
Comment 25 David Williams CLA 2014-01-13 09:17:02 EST
(In reply to Arun Thondapu from comment #21)
> (In reply to Dani Megert from comment #20)
> > Again DNFed in N20140107-2000.
> > 
> > Last output to console:
> > SwtTestCase#setUp():
> > org.eclipse.swt.tests.junit.browser.Test_BrowserSuite#Browser1
> 
> I'm suspecting this to be a problem in the test machine's environment .... 

And the importance of this is that users might have the same "faulty setup"? Would be nice to narrow down the problem so we could tell users what they need, even if we can't fix it in our code. 

> ... but I'm not sure how to verify that. 

FWIW, some time when you have an afternoon or some block of time to focus on it, you can send email to 'webmaster@eclipse.org', and they can set things up so you can VNC to the machine to "take a closer look" at it. For security reasons, they like doing this for a limited time, for specific people and purposes, rather then "leave it open" all the time (that is, they'll password protect it, etc, but prefer to have that in effect for just a few days or week or so), so if that'd help you figure out what's going on, you can email to arrange the right time and details. 

Let me know if there's anything I can do to help ... but no reason for me to be the "middle man" ... would be best to have the experts working directly with the machine and webmasters rather than try to have me try to relay things back and forth.
Comment 26 Markus Keller CLA 2014-01-13 10:03:57 EST
(In reply to Alexander Kurtakov from comment #23)
> Also the change you mentioned is from 2014-01-10 which is past the tests
> starting to DNF and I fail to find why is the logging needed, would you
> please explain?

Nope, the "System.out.println(..)" line was last touched in 2013 and the comment refers to bug 423561. The last 2 paragraphs of comment 10 refer to the actual commit and explain why the logging was added.

If you already have local changes, you may have to perform "Show Annotations" on an earlier commit to work around bug 388543.
Comment 27 Alexander Kurtakov CLA 2014-01-13 11:15:21 EST
Do we know the actual xulrunner version on the test machine? It is possible that tests DNF with that given xulrunner version. Experience I had with xulrunner is that changes there are unpredictable even in x.y.z+1 release.
Also shouldn't tests run with webkit (aka having webkit installed on the test machine) as it is the default browser backend on linux?
Comment 28 Markus Keller CLA 2014-01-13 14:17:11 EST
> Also shouldn't tests run with webkit (aka having webkit installed on the
> test machine) as it is the default browser backend on linux?

See bug 423836: Tests should run with WebKit *and* with XULRunner, since SWT supports both technologies.
Comment 29 Dani Megert CLA 2014-01-21 07:08:53 EST
Did we ch(In reply to Arun Thondapu from comment #14)
> (In reply to Arun Thondapu from comment #13)
> > We'll enable the tests again on Monday.
> 
> I've re-enabled the Browser tests that were disabled last week to avoid the
> test DNFs during M4. Will monitor the test results of tonight's nightly
> build to confirm whether the tests finish running on Linux.

Do we have a bug for that?
Comment 30 Arun Thondapu CLA 2014-01-21 10:35:36 EST
(In reply to Dani Megert from comment #29)
> 
> Do we have a bug for that?

No, there is no separate bug.

(In reply to Alexander Kurtakov from comment #27)
> Do we know the actual xulrunner version on the test machine? It is possible
> that tests DNF with that given xulrunner version.

Alex, the machine the tests are currently running on (hudson-slave4) has XULRunner 1.9.0.19 as well as 1.9.1.19 installed. I'm not sure which of these is being used while running the tests. It could be a good idea to install a newer version of XULRunner like 10.0 or even 24.0 (supported only 4.4 M5 onwards) for running the tests.
Comment 31 Grant Gayed CLA 2014-01-21 10:50:40 EST
(In reply to Arun Thondapu from comment #30)
> these is being used while running the tests. It could be a good idea to
> install a newer version of XULRunner like 10.0 or even 24.0 (supported only
> 4.4 M5 onwards) for running the tests.

It would be better if WebKitGTK was on there since this is the default renderer on Linux whenever it's detected.  However I'm not sure if this is possible given the vintage of the test machine's SuSE version.
Comment 32 Alexander Kurtakov CLA 2014-01-21 10:58:32 EST
I had proposed and have internal approval to assign RHEL subscriptions to Eclipse Foundation so we can get VM for SWT needs. I'm still waiting for the Foundation RHN account for the assignment to be finalized. 
Having a dedicated RHEL VM for SWT on foundation would even allow compiling natives for linux as part of gerrit verification.
Comment 33 Dani Megert CLA 2014-01-28 11:20:53 EST
See bug 423836 comment 3 for more details about installed WebKit on build.eclipse.org.
Comment 34 Dani Megert CLA 2014-01-28 11:28:29 EST
http://download.eclipse.org/eclipse/downloads/drops4/M20140124-1600/testResults.php is green.

Are the tests still disabled, or did the problem somehow disappear?
Comment 35 Arun Thondapu CLA 2014-01-29 09:07:28 EST
(In reply to Dani Megert from comment #34)
> http://download.eclipse.org/eclipse/downloads/drops4/M20140124-1600/
> testResults.php is green.
> 
> Are the tests still disabled, or did the problem somehow disappear?

Its the latter, in fact the tests were never disabled in the maintenance stream at all, they were disabled and later on re-enabled in the master branch only.
Comment 36 Dani Megert CLA 2014-01-29 09:43:57 EST
(In reply to Arun Thondapu from comment #35)
> (In reply to Dani Megert from comment #34)
> > http://download.eclipse.org/eclipse/downloads/drops4/M20140124-1600/
> > testResults.php is green.
> > 
> > Are the tests still disabled, or did the problem somehow disappear?
> 
> Its the latter, in fact the tests were never disabled in the maintenance
> stream at all, they were disabled and later on re-enabled in the master
> branch only.

OK, thanks. The tests passed again in the last two M-builds, so I'm going to move this out of 4.3.2.
Comment 37 Arun Thondapu CLA 2014-02-12 23:48:54 EST
(In reply to Alexander Kurtakov from comment #27)
> Do we know the actual xulrunner version on the test machine? It is possible
> that tests DNF with that given xulrunner version. Experience I had with
> xulrunner is that changes there are unpredictable even in x.y.z+1 release.
> Also shouldn't tests run with webkit (aka having webkit installed on the
> test machine) as it is the default browser backend on linux?

Tests are running with Webkit 1.2.7 now and the DNF happened again in N20140211-2000. Alex, since the logging code added by Markus in SwtTestCase (SwtTestUtil now) is removed, we do not know exactly where the tests hang or stop running... can you bring the logging code back till we figure out the cause for these DNFs? I'm not sure if it is possible after the recent changes in the test infrastructure, WDYT?
Comment 38 Alexander Kurtakov CLA 2014-02-13 03:29:34 EST
(In reply to Arun Thondapu from comment #37)
> (In reply to Alexander Kurtakov from comment #27)
> > Do we know the actual xulrunner version on the test machine? It is possible
> > that tests DNF with that given xulrunner version. Experience I had with
> > xulrunner is that changes there are unpredictable even in x.y.z+1 release.
> > Also shouldn't tests run with webkit (aka having webkit installed on the
> > test machine) as it is the default browser backend on linux?
> 
> Tests are running with Webkit 1.2.7 now and the DNF happened again in
> N20140211-2000. Alex, since the logging code added by Markus in SwtTestCase
> (SwtTestUtil now) is removed, we do not know exactly where the tests hang or
> stop running... can you bring the logging code back till we figure out the
> cause for these DNFs? I'm not sure if it is possible after the recent
> changes in the test infrastructure, WDYT?

What logging do you want to have Arun ?
System.out.println("Browser#setUp(): " + getClass().getName() + "#" + getName());  only ?
We can add such logging to the setUp methods in exactly the test case we need which would make the output contain only relevant information. Let me know what you want to have logged and I'll add it.
Comment 39 Arun Thondapu CLA 2014-02-14 07:41:03 EST
(In reply to Alexander Kurtakov from comment #38)
> What logging do you want to have Arun ?
> System.out.println("Browser#setUp(): " + getClass().getName() + "#" +
> getName());  only ?
> We can add such logging to the setUp methods in exactly the test case we
> need which would make the output contain only relevant information. Let me
> know what you want to have logged and I'll add it.

The problem here is that when the tests DNF, we do not have any idea which particular test was running when the actual process running the tests goes into a hang and ultimately gets killed by a timeout. The logging was added to pinpoint which test ran last before the process gets timed out (this test would most probably be the one which caused the hang).

So, basically we cannot deterministically choose which tests to add the logging to for finding the cause of the DNF. However, based purely on past experience (from the logging data again), the DNFs were most likely to happen while one of the tests in the Test_BrowserSuite was running. May be we can add the logging code to these tests (Browser1 to Browser9) and see whether the pattern repeats whenever the DNF happens again.
Comment 40 Alexander Kurtakov CLA 2014-02-18 02:22:24 EST
Logging added back to browser suite http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=88a405a3ee888d70d378ce7dfaf28d1dcb421051 .
Comment 41 Arun Thondapu CLA 2014-03-14 07:33:16 EDT
The DNF seems to have happened only once in the last month or even longer than that. Not sure if this has got something to do with changes in the hudson test infrastructure or installation of webkit on the test machine (which means that webkit is now being used while running the tests as opposed to xulrunner earlier).


(In reply to Alexander Kurtakov from comment #40)
> Logging added back to browser suite
> http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/
> ?id=88a405a3ee888d70d378ce7dfaf28d1dcb421051 .

As for the last DNF that happened in I20140306-1200, the browser suite logging has not helped, so the tests stopped running somewhere else.

Alex, can you think of a better way to enable logging for all tests other than bringing back SwtTestCase?
Comment 42 Arun Thondapu CLA 2014-05-05 13:39:40 EDT
There haven't been any DNFs in the recent builds, I'll continue to monitor the test results for any recurring occurrences. I'm keeping the bug open for now and changing the target release from 4.4 to 4.5.
Comment 43 Gerrit Volkenborn CLA 2014-05-19 06:43:01 EDT
Does this help in any way?

http://stackoverflow.com/questions/13960376/java-swt-browser-xulrunner-crash-java-vm

It seems that the crash is only occurring if two Browser instances are created by the same Java VM sequentially.
Comment 44 Markus Keller CLA 2014-09-02 09:56:26 EDT
The SWT tests DNF'd again in the last few N-builds:
N20140901-2000: after testBrowser1
N20140831-2000: after testBrowser3
N20140831-2000: didn't reach browser tests

I've cleaned up some code duplication in the test suites and moved browser tests into a separate test run with timeout = 15 min. This allows us to see test results for the rest of SWT, and it's the first step for bug 423836. None of the existing test suites has been changed functionally.

http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=5c54974edbe7d779f3f9a8ae1153aea350594f28
Comment 45 Sravan Kumar Lakkimsetti CLA 2015-03-24 06:38:25 EDT
We do not have any DNF in recent builds. So closing this bug. Please feel free to reopen this if any DNF is found in SWT tests
Comment 46 Markus Keller CLA 2015-03-25 10:57:17 EDT
We still have DNFs on Linux, but they're hard to see now due to bug 210792.

In comment 44, I've separated the crash-prone AllBrowserTests from the rest of the SWT tests (AllNonBrowserTests). While http://download.eclipse.org/eclipse/downloads/drops4/I20150324-0800/testResults.php looks like everything is OK, the http://download.eclipse.org/eclipse/downloads/drops4/I20150324-0800/testresults/html/org.eclipse.swt.tests_linux.gtk.x86_64_8.0.html actually misses results from the AllBrowserTests.

http://download.eclipse.org/eclipse/downloads/drops4/I20150324-0800/testresults/consolelogs/linux.gtk.x86_64_8.0_consolelog.txt says (search for "AllBrowserTests" and then scroll down a bit):

     [java] Timeout: killed the sub-process

Since this concrete failure is probably not due to mozilla.XPCOM, I've opened bug 463102.

I'm reopening this bug and I made it depend on bug 210792. We can only declare success here after that bug is fixed and we haven't seen DNFs in a while.
Comment 47 Markus Keller CLA 2015-03-30 11:04:11 EDT
(In reply to Markus Keller from comment #46)
> We can only declare success here after that bug is fixed and we haven't seen
> DNFs in a while.

And of course, the test needs to be enabled first. Enabled Browser2 and others with http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=61ac0970c1c501525ee9c1ee8c104c394020c61a
Comment 48 Markus Keller CLA 2015-04-08 09:48:28 EDT
OK, on a local Ubuntu 12.04 32-bit, I see this:

- xulrunner-3.6.28 passes

- xulrunner-10.0.4esr crashes like this (that's this bug):

Stack: [0xb771e000,0xb776f000],  sp=0xb776c6ac,  free space=313k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.eclipse.swt.internal.mozilla.XPCOM._VtblCall(II[CIIII)I+0
j  org.eclipse.swt.internal.mozilla.XPCOM.VtblCall(II[CIIII)I+17
j  org.eclipse.swt.internal.mozilla.nsIWebNavigation.LoadURI([CIIII)I+17
j  org.eclipse.swt.browser.Mozilla.setUrl(Ljava/lang/String;[B[Ljava/lang/String;)Z+622
j  org.eclipse.swt.browser.Mozilla.setUrl(Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Z+20
j  org.eclipse.swt.browser.Browser.setUrl(Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Z+19
j  org.eclipse.swt.browser.Browser.setUrl(Ljava/lang/String;)Z+8
j  org.eclipse.swt.tests.junit.browser.Browser1.test1(Ljava/lang/String;)Z+186
j  org.eclipse.swt.tests.junit.browser.Browser1.test()Z+20
j  org.eclipse.swt.tests.junit.browser.Test_BrowserSuite.testBrowser1()V+0

- xulrunner-24.0 fails testBrowser1, 2, and 3 with "XPCOM error 0x80040154", and then unexpectedly exits in testBrowser4 with exit value 139.

Test_BrowserSuite
org.eclipse.swt.tests.junit.browser.Test_BrowserSuite
testBrowser1(org.eclipse.swt.tests.junit.browser.Test_BrowserSuite)
org.eclipse.swt.SWTError: XPCOM error 0x80040154
	at org.eclipse.swt.browser.Mozilla.error(Mozilla.java:2938)
	at org.eclipse.swt.browser.Mozilla.execute(Mozilla.java:1533)
	at org.eclipse.swt.browser.Mozilla.onDispose(Mozilla.java:2947)
	at org.eclipse.swt.browser.Mozilla$5.handleEvent(Mozilla.java:954)
	at org.eclipse.swt.widgets.EventTable.sendEvent(EventTable.java:84)
	at org.eclipse.swt.widgets.Display.sendEvent(Display.java:4449)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1317)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1341)
	at org.eclipse.swt.widgets.Widget.sendEvent(Widget.java:1322)
	at org.eclipse.swt.widgets.Widget.release(Widget.java:1130)
	at org.eclipse.swt.widgets.Control.release(Control.java:3809)
	at org.eclipse.swt.widgets.Composite.releaseChildren(Composite.java:1363)
	at org.eclipse.swt.widgets.Canvas.releaseChildren(Canvas.java:227)
	at org.eclipse.swt.widgets.Decorations.releaseChildren(Decorations.java:480)
	at org.eclipse.swt.widgets.Shell.releaseChildren(Shell.java:2602)
	at org.eclipse.swt.widgets.Widget.release(Widget.java:1133)
	at org.eclipse.swt.widgets.Control.release(Control.java:3809)
	at org.eclipse.swt.widgets.Widget.dispose(Widget.java:472)
	at org.eclipse.swt.widgets.Shell.dispose(Shell.java:2533)
	at org.eclipse.swt.widgets.Shell.closeWidget(Shell.java:650)
	at org.eclipse.swt.widgets.Shell.close(Shell.java:645)
	at org.eclipse.swt.tests.junit.browser.Browser1$2.completed(Browser1.java:77)
	at org.eclipse.swt.browser.Mozilla$23.run(Mozilla.java:4164)
	at org.eclipse.swt.widgets.Synchronizer.syncExec(Synchronizer.java:186)
	at org.eclipse.swt.widgets.Display.syncExec(Display.java:4601)
	at org.eclipse.swt.browser.Mozilla.OnStateChange(Mozilla.java:4171)
	at org.eclipse.swt.browser.Mozilla$10.method3(Mozilla.java:1128)
	at org.eclipse.swt.internal.mozilla.XPCOMObject.callback3(XPCOMObject.java:271)
	at org.eclipse.swt.internal.gtk.OS._g_main_context_iteration(Native Method)
	at org.eclipse.swt.internal.gtk.OS.g_main_context_iteration(OS.java:2406)
	at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:3396)
	at org.eclipse.swt.tests.junit.browser.Browser1.runLoopTimer(Browser1.java:206)
	at org.eclipse.swt.tests.junit.browser.Browser1.test1(Browser1.java:103)
	at org.eclipse.swt.tests.junit.browser.Browser1.test(Browser1.java:217)
	at org.eclipse.swt.tests.junit.browser.Test_BrowserSuite.testBrowser1(Test_BrowserSuite.java:37)
Comment 49 Markus Keller CLA 2015-04-08 10:35:05 EDT
On hudson.eclipse.org, testBrowser1, 5, and 6 consistently fail on GTK. MozillaVersion.GetCurrentVersion() returns 2. I don't know what exact version of XULRunner is installed on that machine, but it looks like some kind of 1.9.

On my local test machine (comment 48), it returns 4 for xulrunner-3.6.28.
And all browser tests pass when I run them with the default setup (WebKit).

I've disabled the failing tests for now on GTK for MozillaVersion 2 (in Test_BrowserSuite): http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/?id=6b77de13aa619174b5dcb0305f4ddd19cdaf70b4


Let's keep this bug for the crash in XPCOM._VtblCall in xulrunner-10.0.4esr.
Comment 50 Lakshmi P Shanmugam CLA 2015-04-30 04:05:14 EDT
(In reply to Markus Keller from comment #49)
> On hudson.eclipse.org, testBrowser1, 5, and 6 consistently fail on GTK.
> MozillaVersion.GetCurrentVersion() returns 2. I don't know what exact
> version of XULRunner is installed on that machine, but it looks like some
> kind of 1.9.
> 
> On my local test machine (comment 48), it returns 4 for xulrunner-3.6.28.
> And all browser tests pass when I run them with the default setup (WebKit).
> 
> I've disabled the failing tests for now on GTK for MozillaVersion 2 (in
> Test_BrowserSuite):
> http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/
> ?id=6b77de13aa619174b5dcb0305f4ddd19cdaf70b4
> 
Hi Markus,
The Browser Tests 1,5,6,8 and 9 are currently failing with the last few I-builds (Bug 465721). These tests are disabled and are not supposed to run. Do you know what is the problem here, as they started failing only recently?
Comment 51 Lakshmi P Shanmugam CLA 2015-04-30 05:03:56 EDT
The console log (In reply to Lakshmi Shanmugam from comment #50)
> (In reply to Markus Keller from comment #49)
> > On hudson.eclipse.org, testBrowser1, 5, and 6 consistently fail on GTK.
> > MozillaVersion.GetCurrentVersion() returns 2. I don't know what exact
> > version of XULRunner is installed on that machine, but it looks like some
> > kind of 1.9.
> > 
> > On my local test machine (comment 48), it returns 4 for xulrunner-3.6.28.
> > And all browser tests pass when I run them with the default setup (WebKit).
> > 
> > I've disabled the failing tests for now on GTK for MozillaVersion 2 (in
> > Test_BrowserSuite):
> > http://git.eclipse.org/c/platform/eclipse.platform.swt.git/commit/
> > ?id=6b77de13aa619174b5dcb0305f4ddd19cdaf70b4
> > 
> Hi Markus,
> The Browser Tests 1,5,6,8 and 9 are currently failing with the last few
> I-builds (Bug 465721). These tests are disabled and are not supposed to run.
> Do you know what is the problem here, as they started failing only recently?

The console log shows that MozillaVersion.GetCurrentVersion() returns 4 now instead of 2 (http://download.eclipse.org/eclipse/downloads/drops4/I20150429-2000/testresults/linux.gtk.x86_64_8.0/org.eclipse.swt.tests.junit.AllBrowserTests.txt). Looks like the XULRunner has been updated on the test machine to 1.9.2. So, the version checks need to be updated.
Comment 52 Kamil Fejfar CLA 2016-08-24 04:02:40 EDT
JVM crash on CentOS 7, xulrunner-10.0.2.

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.eclipse.swt.internal.mozilla.XPCOM._VtblCall(IJ[CIJJJ)I+0
j  org.eclipse.swt.internal.mozilla.XPCOM.VtblCall(IJ[CIJJJ)I+18
j  org.eclipse.swt.internal.mozilla.nsIWebNavigation.LoadURI([CIJJJ)I+17
j  org.eclipse.swt.browser.Mozilla.setUrl(Ljava/lang/String;[B[Ljava/lang/String;)Z+628
j  org.eclipse.swt.browser.Mozilla.setUrl(Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Z+20
j  org.eclipse.swt.browser.Browser.setUrl(Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Z+19
j  org.eclipse.swt.browser.Browser.setUrl(Ljava/lang/String;)Z+8
Comment 53 Kamil Fejfar CLA 2016-08-24 04:05:33 EDT
JVM crash on CentOS 7, xulrunner-10.0.2, Eclipse 4.4.0.v20140925-0400.

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.eclipse.swt.internal.mozilla.XPCOM._VtblCall(IJ[CIJJJ)I+0
j  org.eclipse.swt.internal.mozilla.XPCOM.VtblCall(IJ[CIJJJ)I+18
j  org.eclipse.swt.internal.mozilla.nsIWebNavigation.LoadURI([CIJJJ)I+17
j  org.eclipse.swt.browser.Mozilla.setUrl(Ljava/lang/String;[B[Ljava/lang/String;)Z+628
j  org.eclipse.swt.browser.Mozilla.setUrl(Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Z+20
j  org.eclipse.swt.browser.Browser.setUrl(Ljava/lang/String;Ljava/lang/String;[Ljava/lang/String;)Z+19
j  org.eclipse.swt.browser.Browser.setUrl(Ljava/lang/String;)Z+8
Comment 54 Alexander Kurtakov CLA 2017-07-04 13:25:37 EDT
Mozilla support is removed for 4.8.