Bug 461797 - Run platform linux test on a server with GTK3 (in addition to GTK2).
Summary: Run platform linux test on a server with GTK3 (in addition to GTK2).
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 4.2.1   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: 4.6 M7   Edit
Assignee: David Williams CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 490232 490440 490554
Blocks: 458120
  Show dependency tree
 
Reported: 2015-03-10 06:49 EDT by Lars Vogel CLA
Modified: 2016-04-11 12:29 EDT (History)
10 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Lars Vogel CLA 2015-03-10 06:49:54 EDT
I see several platform tests locally failing if I run them with SWT GTK3, they pass with SWT on GTK2.

I assume that GTK3 is the main target for the SWT development in Eclipse 4.5. 

Alexander, is my assumption correct? If that is true, I suggest to update our test server to use GTK3.
Comment 1 Alexander Kurtakov CLA 2015-03-10 07:18:28 EDT
I fully agree that this change is long overdue. 
There is centos machine available to hipp instances but I don't know how this one can be hooked for running the tests. 
Adding Mikael (for infrastructure work if needed).
David, would you please describe what is needed to do this?
Comment 2 Lars Vogel CLA 2015-03-10 07:24:36 EDT
If we do this move I suggest:
1.) we should ensure that we have a (reasonable) failure free build in platform
2.) call a short freeze in commit and move to GTK3

This way we can be sure that any test error is causes by this SWT change.
Comment 3 David Williams CLA 2015-03-24 16:29:33 EDT
(In reply to Alexander Kurtakov from comment #1)

> David, would you please describe what is needed to do this?

To run our standard unit tests on centos, should be pretty easy. (Famous last words. :) 

I think you can "copy/paste" the "shell script" from our shared instance, at 
https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/ep45I-unit-lin64/
and it would do nearly everything, as-is, automatically. 

Well, you'll also have add the "parameters" at the top of that job, buildId, stream, and "builder hash". 

Once that is in place, I suspect you could put in a build id say for our I build this morning, I20150324-0800, the stream parameter needs to be 4.5.0, and for "hash" I think you can just use "master". Or, if you really wanted to "duplicate" the test, you could look up the "parameters" of a job that has already ran, and use that. For example, from this morning, the hash was 
e84d8790feb63fd72445b18e06ccaa56ad499f17

You should name the job very similar to what it currently is, not sure it would matter "just to run them" but subsequent "automated processing" is sometimes keyed off parts of the job name, such as the concluding "lin64" so, I'd name I'd name it something like ep45I-unit-centos-lin64. 

What I've described so far, is to simply get an "existing build" to test there, and "leave" the results there on Hudson, which would be a really good start, if that works! 

If/when that works, we could talk more about changes to "trigger" it automatically, and changes to get the results "summarized" with the rest of our tests ... if you wanted to go that far. (Oh, and there's some tests like CVS that needs "per machine" set up ... but, we could start just letting those 5 or 6 fail. 

Make sense? 

and, hmmm, where is that centos machine? I mean ... what's the URL for it? 
Do you already have some jobs running there? 

I personally think it would  be great to test both GTK3 and GTK2, as I think we are still supporting both, right? I guess we wouldn't need separate machines for that (could run them twice, with parameters set differently), but ... is good to test on many machines, IMHO. 

Thanks, hope this helps,
Comment 4 Mikaël Barbero CLA 2015-03-25 05:32:59 EDT
(In reply to David Williams from comment #3)
> and, hmmm, where is that centos machine? I mean ... what's the URL for it? 
> Do you already have some jobs running there? 

We setup some builds on the CBI HIPP (AFAIK the only one with access to the centos) as a POC to build the platform on Centos. 

You can see it here https://hudson.eclipse.org/cbi/job/cbi-swt-natives-linux-x86_64/ and https://hudson.eclipse.org/cbi/job/cbi-platform-aggregator-linux-x86_64/.
Comment 5 Lars Vogel CLA 2015-08-13 10:46:28 EDT
Alexander, I think in the last architecture call you mentioned that you are working on this. Can you give an update?
Comment 6 Andrey Loskutov CLA 2016-01-05 05:59:51 EST
Ping...
Comment 7 Alexander Kurtakov CLA 2016-01-05 06:07:53 EST
Having a RHEL/CentOS 7.x VM to run the tests is probably the first step. Next step should be to stabilize/fix the tests to run properly. I don't have the time to start this now so feel free to step in.
Comment 8 Lars Vogel CLA 2016-01-05 06:15:11 EST
Mikael, which GTK versions are available on the Gerrit build server? If GTK3 is available there we could configure our platform UI tests to use GTK3.
Comment 9 David Williams CLA 2016-01-09 13:06:50 EST
(In reply to Mikael Barbero from comment #4)
> (In reply to David Williams from comment #3)
> > and, hmmm, where is that centos machine? I mean ... what's the URL for it? 
> > Do you already have some jobs running there? 
> 
> We setup some builds on the CBI HIPP (AFAIK the only one with access to the
> centos) as a POC to build the platform on Centos. 
> 
> You can see it here
> https://hudson.eclipse.org/cbi/job/cbi-swt-natives-linux-x86_64/ and
> https://hudson.eclipse.org/cbi/job/cbi-platform-aggregator-linux-x86_64/.

It seems those "aggregator" builds have been failing for a while. Failing at the point of "assembling the unit tests". Anyone know why? Is that because this machine builds just one version of SWT? 

= = = = = = 

But independent of the build errors, our "full" unit tests should be easy to run on this machine, as I have outlined in comment 3 since it is just a matter of copying a few 'conifig.xml' files. 

If no one else wants to do that, I could "do the copy" if given 'admin' rights (at least to create jobs). 

I would be willing to copying the configs, and "see if it runs". If it runs out there are a lot of differences between centros and solaris, I am NOT volunteering to track all those down, but I would be surprised if there were. 

I am suggesting (and offering) to do enough to "run the tests of an existing build manually". 

Integration of the tests with the rest of the build is a different matter and would take more effort than I have time for right now. (By "integration", I mean that they would be triggered automatically whenever there was a new build, and the results collected and summarized for our download pages, as we do for the other three test platforms). 

But, if seems that just to "run them manually" would be a small step forward?
Comment 10 Mikaël Barbero CLA 2016-01-22 10:18:05 EST
(In reply to Lars Vogel from comment #8)
> Mikael, which GTK versions are available on the Gerrit build server? If GTK3
> is available there we could configure our platform UI tests to use GTK3.

What do you mean by "gerrit build server"? Please give me a link to the job you're referring to so I can give you an answer.

(In reply to David Williams from comment #9)
> It seems those "aggregator" builds have been failing for a while. Failing at
> the point of "assembling the unit tests". Anyone know why? Is that because
> this machine builds just one version of SWT? 

I don't know. It's been a long since I had the time to check on these jobs.

> 
> = = = = = = 
> 
> But independent of the build errors, our "full" unit tests should be easy to
> run on this machine, as I have outlined in comment 3 since it is just a
> matter of copying a few 'conifig.xml' files. 
> 
> If no one else wants to do that, I could "do the copy" if given 'admin'
> rights (at least to create jobs). 
> 
> I would be willing to copying the configs, and "see if it runs". If it runs
> out there are a lot of differences between centros and solaris, I am NOT
> volunteering to track all those down, but I would be surprised if there
> were. 
> 
> I am suggesting (and offering) to do enough to "run the tests of an existing
> build manually". 
> 
> Integration of the tests with the rest of the build is a different matter
> and would take more effort than I have time for right now. (By
> "integration", I mean that they would be triggered automatically whenever
> there was a new build, and the results collected and summarized for our
> download pages, as we do for the other three test platforms). 
> 
> But, if seems that just to "run them manually" would be a small step forward?

You're already a CBI committer so you should be able to modify/create jobs on this HIPP. Let me know if you face issues.
Comment 11 David Williams CLA 2016-03-16 13:21:38 EDT
It appears the Platform Hipp has a "centros" slave. 

See 
https://hudson.eclipse.org/platform/

While defining a "testAccess" job there, I was given choices where to restrict the build" (besides master) one named "hippcentros" the other named just "centros". 

I assume these are the same machine. 

uname -a 
replies with 
Linux centos 3.10.0-123.13.2.el7.x86_64 #1 SMP Thu Dec 18 14:09:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

no matter which slave I pick. 

So assume I can do the copy/paste. 

I assume it should be all Unit tests? Not just "swt"?
Comment 12 David Williams CLA 2016-03-16 14:13:59 EDT
CCing webmasters, as I think some help/setup is needed from them. 

I set up a job to run our normal platform unit tests on the "contos" slave on the Platform HIPP. 

https://hudson.eclipse.org/platform/view/Unit%20Tests/job/ep46I-unit-cen64/

Two immediate problems. 

First, I checked "use XVNC" as I always would, and that caused the following errors. That _might_ be an indication something is wrong? The job is running as "genie.platform". And, remember, this is "pure Hudson" at this point. I am not trying to kill XVNC sessions or /tmp/*locks, but think Hudson does if it finds a need to "clean up XVnc"? That might be configurable, and this may not even be a "real" problem, since eventually it did say 

Xvnc TigerVNC 1.3.1 - built Nov 20 2015 20:53:05
Copyright (C) 1999-2011 TigerVNC Team and many others (see README.txt)
See http://www.tigervnc.org for information on TigerVNC.
Underlying X server release 11702000, The X.Org Foundation

= = = = = = = = = = 


$ pkill Xvnc
pkill: killing pid 2568 failed: Operation not permitted
pkill: killing pid 4224 failed: Operation not permitted
pkill: killing pid 5196 failed: Operation not permitted
pkill: killing pid 5711 failed: Operation not permitted
pkill: killing pid 6931 failed: Operation not permitted
pkill: killing pid 13659 failed: Operation not permitted
pkill: killing pid 19825 failed: Operation not permitted
pkill: killing pid 21632 failed: Operation not permitted
pkill: killing pid 30372 failed: Operation not permitted
pkill: killing pid 31266 failed: Operation not permitted
$ pkill Xrealvnc
$ sh -c "rm -f /tmp/.X*-lock /tmp/.X11-unix/X*"
rm: cannot remove '/tmp/.X1260-lock': Operation not permitted
rm: cannot remove '/tmp/.X1261-lock': Operation not permitted
rm: cannot remove '/tmp/.X1262-lock': Operation not permitted
rm: cannot remove '/tmp/.X1263-lock': Operation not permitted
rm: cannot remove '/tmp/.X1264-lock': Operation not permitted
rm: cannot remove '/tmp/.X1265-lock': Operation not permitted
rm: cannot remove '/tmp/.X1266-lock': Operation not permitted
rm: cannot remove '/tmp/.X1267-lock': Operation not permitted
rm: cannot remove '/tmp/.X1268-lock': Operation not permitted
rm: cannot remove '/tmp/.X1269-lock': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1260': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1261': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1262': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1263': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1264': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1265': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1266': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1267': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1268': Operation not permitted
rm: cannot remove '/tmp/.X11-unix/X1269': Operation not permitted

= = = = = = = = = = = = = = = =  =


The second, move obvious problem is that the machine apparently does not have "unzip" installed on it? Or, it is on a "special path" I need to make sure use? 

The error message that indicates to me "unzip" is not installed (or, not on path) is: 

     [exec] /opt/public/hipp/homes/genie.platform/workspace/ep46I-unit-cen64/workarea/I20160316-0800/eclipse-testing/test.xml:104: Execute failed: java.io.IOException: Cannot run program "unzip": error=2, No such file or directory
     [exec] 	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)

From "echo's" in my script, it says the PATH is 

PATH: /shared/common/jdk1.8.0_x64-latest/bin:/shared/common/apache-ant-1.9.6/bin:/shared/common/jdk1.8.0_x64-latest/bin:/usr/local/bin:/usr/bin

Is there someplace besides /usr/local/bin:/usr/bin that is needed for "unzip"?
Comment 13 David Williams CLA 2016-03-16 14:29:25 EDT
(In reply to David Williams from comment #12)

> Is there someplace besides /usr/local/bin:/usr/bin that is needed for
> "unzip"?

This a small "probe" script running on Platform HIPP, I used the "which" command, to see where zip and unzip were, and I confirmed that on the "SUSE"? box, the path and location are as expected: 

= = = = = 

	whoami: genie.platform

	uname -a: 
Linux hipp7 3.0.101-0.47.71-default #1 SMP Thu Nov 12 12:22:22 UTC 2015 (b5b212e) x86_64 x86_64 x86_64 GNU/Linux

	HOME: /home/hudson/genie.platform

	PATH: 	/shared/common/jdk1.8.0_x64-latest/bin:/usr/bin:/bin:/usr/sbin:/sbin
/usr/bin/zip
/usr/bin/unzip

= = = = = 
But, no zip or unzip on the Centros machine

= = = = = 

	whoami: genie.platform

	uname -a: 
Linux centos 3.10.0-123.13.2.el7.x86_64 #1 SMP Thu Dec 18 14:09:13 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

	HOME: /opt/public/hipp/homes/genie.platform

	PATH: 	/shared/common/jdk1.8.0_x64-latest/bin:/usr/local/bin:/usr/bin
which: no zip in (/shared/common/jdk1.8.0_x64-latest/bin:/usr/local/bin:/usr/bin)
which: no unzip in (/shared/common/jdk1.8.0_x64-latest/bin:/usr/local/bin:/usr/bin)

= = = = =
Comment 14 Eclipse Webmaster CLA 2016-03-16 16:44:43 EDT
I've installed the unzip package so that should now be working.

As for VNC those locks files etc. that your trying to clean up are owned by another HIPP user, so you(or Hudson) can't remove them.

One thing that may also bite you is that TigerVNC requires a password to connect to the VNC service.  You can override that by specifying '-SecurityTypes none' as an argument when it starts.    

-M.
Comment 15 Alexander Kurtakov CLA 2016-03-17 05:27:18 EDT
(In reply to David Williams from comment #11)
> It appears the Platform Hipp has a "centros" slave. 
> 
> See 
> https://hudson.eclipse.org/platform/
> 
> While defining a "testAccess" job there, I was given choices where to
> restrict the build" (besides master) one named "hippcentros" the other named
> just "centros". 
> 
> I assume these are the same machine. 
> 
> uname -a 
> replies with 
> Linux centos 3.10.0-123.13.2.el7.x86_64 #1 SMP Thu Dec 18 14:09:13 UTC 2014
> x86_64 x86_64 x86_64 GNU/Linux
> 
> no matter which slave I pick. 
> 
> So assume I can do the copy/paste. 
> 
> I assume it should be all Unit tests? Not just "swt"?

Yeah, having all tests run would be perfect.
Comment 16 David Williams CLA 2016-03-17 08:08:05 EDT
(In reply to Alexander Kurtakov from comment #15)
 
> Yeah, having all tests run would be perfect.

And the (first) results are in! 

See 
https://hudson.eclipse.org/platform/job/ep46I-unit-cen64/2/

Let me know if anything looks "off" from a setup or infrastructure point of view. 

Those results are from the I20160316-0800 build. 

I will start another based on the M6 candidate. 

= = = = = = = = = = =

I think I mentioned before, but will repeat, "getting the tests to run" is easy. 

But anything more, such as automatically getting them to run and especially summarizing the results along with the rest of the machines will be harder (i.e. take longer) so I hope in the meantime, simply looking at the results directly on Hudson is helpful.
Comment 17 Alexander Kurtakov CLA 2016-03-17 08:27:02 EDT
(In reply to David Williams from comment #16)
> (In reply to Alexander Kurtakov from comment #15)
>  
> > Yeah, having all tests run would be perfect.
> 
> And the (first) results are in! 
> 
> See 
> https://hudson.eclipse.org/platform/job/ep46I-unit-cen64/2/
> 
> Let me know if anything looks "off" from a setup or infrastructure point of
> view. 
> 
> Those results are from the I20160316-0800 build. 
> 
> I will start another based on the M6 candidate. 
> 
> = = = = = = = = = = =
> 
> I think I mentioned before, but will repeat, "getting the tests to run" is
> easy. 
> 
> But anything more, such as automatically getting them to run and especially
> summarizing the results along with the rest of the machines will be harder
> (i.e. take longer) so I hope in the meantime, simply looking at the results
> directly on Hudson is helpful.

Thanks for that David. 
I'll keep an eye on it and try to get test failures fixed. 
Btw, we have discussed some of the issues we face with tests in the nightly/I builds running on GTK 2.x while users use GTK 3.x at the PMC meeting. It would be good if we manage to get full integration (summarized and etc.) though.
Comment 18 David Williams CLA 2016-03-22 23:12:52 EDT
Status: Just tonight, with 
http://download.eclipse.org/eclipse/downloads/drops4/N20160322-2000/

I have things correct to automatically kick off a test pass on 
https://hudson.eclipse.org/platform/view/Unit%20Tests/job/ep46N-unit-cen64/
or 
https://hudson.eclipse.org/platform/view/Unit%20Tests/job/ep46I-unit-cen64/

But is kind of hard to see, since that machine went "offline" today for some reason. (bug 490232). 

Automatically summarizing will take longer -- I've not looked at that yet. I only know it did not work automatically :)
Comment 19 David Williams CLA 2016-04-09 23:15:46 EDT
I think all this is done -- as long as the machine stays online. :) 

Automatically triggered, automatically summarized. 

I suspect we need a lot more swt/ui tests to really confirm. (Which is hard to do, in a headless build?) 

But, I assume it helps.
Comment 20 Dani Megert CLA 2016-04-11 08:42:34 EDT
Good achievement!
Comment 21 Lars Vogel CLA 2016-04-11 12:29:51 EDT
Thanks David.