Community
Participate
Working Groups
I20080827-0935. I need the perf_34x branch and baselines from it in order to fix some broken tests.
We are waiting for a new UPS to arrive so that we can turn on the new machines for 3.5 performance tests in the lab.
Still waiting for UPS to arrive. Maybe it was sent by pony express.....
Architecture council meeting October 1, 2008 decided on the following configurations. XP with 1.5 vm Vista with 1.6 vm RHEL 5 with 1.6 vm SLED 10 with 1.5 vm
Created attachment 114133 [details] patches so far for performance build changes also need to update machine.cfg once machine names are available
also,the patch should reflect that the vms should change as noted in bug 248458
Here is the update on the perf machines Jenn, our sysadmin has spent many hours time trying to get rshd working on the Vista box. We use rshd as the protocol for copying and invoking tests on windows. It seems like it is impossible to invoke rshd on Vista where you can interact with the desktop. You can start something on the command line via ssh, but you can't interact with the desktop, which means that we can't run our tests. Jenn thinks that the root cause of the issue is that MS closed the sockets that were used for this on the Vista release. We are going to set aside the performance tests on Vista and resolve this issue as part of bug 247320 once we are able to test the machines. In other news, the UPSes have arrived in our lab but they require electrical changes to facilitate them in our lab. This was unexpected as they arrived in a different specification from the order. An electrician has been called to make the changes to our lab. In the interim, they are plugged into the wall. We also need an additional switch and KVM box to accommodate all the new hardware. The switch has been ordered and an old KVM box has been rescued from salvage. In the interim, we have moved machines around on the switch so that the new windows performance machines can be used temporararily to run the JUnit tests on windows. I'm working through some issues with the performance baselines right now. The sdk.tests feature isn't being built because of an OOME and I'm trying to discern the root cause. Once this is resolved, I will run the baselines and then run the performance tests in the 3.5 stream builds. I'll also ask IT to image the machines on DVD so they can be easily re-imaged each week for the baseline run.
Created attachment 116107 [details] patch The test feature wasn't building because the tag for the osgi tests from /cvsroot/eclipse was missing the bundle_tests directory for the v20080427-0830 tag. The same tag of the project in /cvsroot/rt did have the full content. This probably happened when the content moved from the eclipse to rt project. I'll open a bug against equinox to notify them of this missing content.
The osgi tests issue was bogus. I'm attaching the stack traces for the build and the ant verbose output. It looks like something is happening where the test features are being zipped up multiple times which is causing an EOM error. I'm investigating.
Created attachment 116295 [details] stack trace that isn't very useful
Created attachment 116296 [details] ant verbose output
Created attachment 116344 [details] patch Andrew fired up his debugger and discovered the source of the problem. Because of a misconfiguration in the build.properties for the sdk.tests, the scripts that pde build generated for the the sdk.tests were using ant zip instead of the executing the zip included with Linux. This caused an OOM when building the junit zip for the performance baselines. Thank you Andrew!
Created attachment 116360 [details] patch to overcome pde build bug 127747
Notes from Andrew If you want the rootfiles that were collected (epl-v10.html, notice.html), they are in tmp/eclipse/ANY.ANY.ANY you could use a customAssembly.xml pre.archive target to copy them from ${eclipse.base}/ANY.ANY.ANY/${collectingFolder} to ${rootFolder} ${rootFolder} is just ${eclipse.base}/group.group.group/${collectingFolder}, it is defined by the ant that calls the custom assembly
Ran the baselines last night. Two problems still 1) The SLED 10 perf machine is not accessible on the network. I opened a bug with IT yesterday. It was on the network before, not sure what happened. I have rebooted it and restarted the network service to no avail. 2) The database is not accessible to the new perf machines. It's running on a new port on the database machine which is blocked through the firewall. I've opened a bug with IT to open this port on the firewall. I thought the database machine was isolated too so all it's ports would be available to other isolated machines but apparently this is not the case. If this change takes a long time, I can -Stop the database for the 3.4.x M builds once this build is complete -Start the new db on the open port -Change the 3.5 scripts to point to the open port -Run the 3.4 baselines -etc In the longer term, I should merge all the data into the newer database.
Jenn has fixed eplnx1 network issue
I changed the port of the database server temporarily to get around the firewall issue. I'll run the baselines once the builds are finished with the machines due to 3.4.x and 3.5 M3 builds.
The firewall is still causing problems. I've installed the database on a perf machine itself to circumvent the firewall.
Created attachment 116875 [details] patch to test.xml
Created attachment 116879 [details] patch to org.eclipse.test.performance New driver needed for new database derby-10.4.2.0 org.apache.derby.jdbc.ClientDriver Also, I have escalated the firewall bug to the IT manager to try to get it resolved more quickly.
We have an electrician in our lab today installing new circuits to allow us to install the perf machines into the UPS. Our IT team was able to see the blocked ports on the firewall that prevent the performance machines from contacting the database machine last night. The rules on the firewall look correct, however, they are blocking access. They are following up with the team in India who actually administers the firewall for further assistance.
The baselines are running however, they aren't writing to the temporary database on the perf machine. There was a problem finding the apache derby libraries while running the tests. I had to upgrade the derby libraries to communicate to the new version of the database. I've released a fix for this and running the baselines again. As an aside, Jenn our sysadmin received electric shocks while installing a new yet defective UPS in our lab. Earlier in the year, she was cut by a falling server rack. This is the dedication that we have...
The firewall rules were fixed yesterday. I was having problems writing to the database earlier this week and asked Frederic for help. Frederic was able to troubleshoot the source of the problem and now I am about to start another test run of the baselines.
Created attachment 117229 [details] patch
The baseline run is still having problems loading the derby libraries to talk to the database. Frederic, Sonia and I looked at this problem today. I released some more patches and am running another test run. The libraries work in my workspace but not on the test machine. If this baseline run doesn't work, I'm going to talk to a Core team member on Monday to work through why the libraries aren't being loaded by the test framework.
The performance baselines are working and writing data to the new database. Yayayayyay! I will start another performance baseline tonight. With the current baseline, I didn't run perf tests on the windows machine this is machine is also currently used for the JUnit tests for regular builds. I didn't want my performance testing to cause JUnit test delays. I have also released changes to the HEAD stream of the builder to run performance tests once the next baseline run is complete.
The perf baselines look like they completed successfully last night. I'll release a new builder and enable performance tests for tonight's nightly build and see how it goes.
There was a problem with the performance results last night. The version of org.eclipse.test.performance that was released to the maps didn't load the new drivers. I've fixed this and will run the performance tests with tonight's build again.
The performance tests didn't run on the weekend because the compile errors in the build prevented them from running. Looking forward to tonight's results.
I20081118-1720 has performance results in the database. Frederic is investigating why results weren't generated automatically on the build page. He will generate them manually for today, we will patch the builder for next time.
(In reply to comment #29) > I20081118-1720 has performance results in the database. Frederic is > investigating why results weren't generated automatically on the build page. > He will generate them manually for today, we will patch the builder for next > time. > I've opened bug 255785 to track this issue...
perf results for the N20081120-2000 build were generated automatically. Thanks Frederic for all your help.
(In reply to comment #31) > perf results for the N20081120-2000 build were generated automatically. Thanks > Frederic for all your help. > You're welcome :-) Note that I opened bug 256156 for the invalid machine names displayed above fingerprints... Note also that eplnx2 baseline results look definitely odd as we got unexpected regression on some tests (e.g. JDT/Core search tests) which are not reproduced on other boxes... Would it be possible to run a new baseline before next I-build?
Frederic, I reran the baselines on Friday.