Bug 245693 - Need perf_34x branch and baselines from it
Summary: Need perf_34x branch and baselines from it
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Releng (show other bugs)
Version: 3.5   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.5 M4   Edit
Assignee: Kim Moir CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 209611 233955 255785
  Show dependency tree
 
Reported: 2008-08-29 09:20 EDT by Dani Megert CLA
Modified: 2008-11-24 10:08 EST (History)
3 users (show)

See Also:


Attachments
patches so far for performance build changes (31.63 KB, patch)
2008-10-02 15:59 EDT, Kim Moir CLA
no flags Details | Diff
patch (997 bytes, text/plain)
2008-10-24 16:08 EDT, Kim Moir CLA
no flags Details
stack trace that isn't very useful (1.08 KB, application/octet-stream)
2008-10-28 10:23 EDT, Kim Moir CLA
no flags Details
ant verbose output (981.52 KB, application/octet-stream)
2008-10-28 10:24 EDT, Kim Moir CLA
no flags Details
patch (748 bytes, text/plain)
2008-10-28 15:43 EDT, Kim Moir CLA
no flags Details
patch to overcome pde build bug 127747 (1.00 KB, patch)
2008-10-28 17:14 EDT, Kim Moir CLA
no flags Details | Diff
patch to test.xml (4.01 KB, patch)
2008-11-03 17:15 EST, Kim Moir CLA
no flags Details | Diff
patch to org.eclipse.test.performance (2.37 KB, patch)
2008-11-03 17:18 EST, Kim Moir CLA
no flags Details | Diff
patch (4.85 KB, patch)
2008-11-06 12:23 EST, Kim Moir CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2008-08-29 09:20:53 EDT
I20080827-0935.

I need the perf_34x branch and baselines from it in order to fix some broken tests.
Comment 1 Kim Moir CLA 2008-09-08 14:40:20 EDT
We are waiting for a new UPS to arrive so that we can turn on the new machines for 3.5 performance tests in the lab.
Comment 2 Kim Moir CLA 2008-09-16 16:59:50 EDT
Still waiting for UPS to arrive. Maybe it was sent by pony express.....
Comment 3 Kim Moir CLA 2008-10-01 12:07:28 EDT
Architecture council meeting October 1, 2008 decided on the following configurations.

XP with 1.5 vm
Vista with 1.6 vm
RHEL 5 with 1.6 vm
SLED 10 with 1.5 vm
Comment 4 Kim Moir CLA 2008-10-02 15:59:17 EDT
Created attachment 114133 [details]
patches so far for performance build changes

also need to update machine.cfg once machine names are available
Comment 5 Kim Moir CLA 2008-10-02 17:05:11 EDT
also,the patch should reflect that the vms should change as noted in bug 248458
Comment 6 Kim Moir CLA 2008-10-21 11:15:22 EDT
Here is the update on the perf machines

Jenn, our sysadmin has spent many hours time trying to get rshd working on the Vista box.  We use rshd as the protocol for copying and invoking tests on windows.  It seems like it is impossible to invoke rshd on Vista where you can interact with the desktop.  You can start something on the command line via ssh, but you can't interact with the desktop, which means that we can't run our tests.  Jenn thinks that the root cause of the issue is that MS closed the sockets that were used for this on the Vista release.

We are going to set aside the performance tests on Vista and resolve this issue as part of bug 247320 once we are able to test the machines.  

In other news, the UPSes have arrived in our lab but they require electrical changes to facilitate them in our lab.  This was unexpected as they arrived in a different specification from the order. An electrician has been called to make the changes to our lab. In the interim, they are plugged into the wall.

We also need an additional switch and KVM box to accommodate all the new hardware. The switch has been ordered and an old KVM box has been rescued from salvage.  In the interim, we have moved machines around on the switch so that the new windows performance machines can be used temporararily to run the JUnit tests on windows.

I'm working through some issues with the performance baselines right now.  The sdk.tests feature isn't being built because of an OOME and I'm trying to discern the root cause. Once this is resolved, I will run the baselines and then run the performance tests in the 3.5 stream builds.  I'll also ask IT to image the machines on DVD so they can be easily re-imaged each week for the baseline run.
Comment 7 Kim Moir CLA 2008-10-24 16:08:30 EDT
Created attachment 116107 [details]
patch

The test feature wasn't building because the tag for the osgi tests from /cvsroot/eclipse was missing the bundle_tests directory for the v20080427-0830 tag.  The same tag of the project in /cvsroot/rt did have the full content.  This probably happened when the content moved from the eclipse to rt project. I'll open a bug against equinox to notify them of this missing content.
Comment 8 Kim Moir CLA 2008-10-28 10:22:20 EDT
The osgi tests issue was bogus.  I'm attaching the stack traces for the build and the ant verbose output.  It looks like something is happening where the test features are being zipped up multiple times which is causing an EOM error.  I'm investigating.
Comment 9 Kim Moir CLA 2008-10-28 10:23:14 EDT
Created attachment 116295 [details]
stack trace that isn't very useful
Comment 10 Kim Moir CLA 2008-10-28 10:24:36 EDT
Created attachment 116296 [details]
ant verbose output
Comment 11 Kim Moir CLA 2008-10-28 15:43:32 EDT
Created attachment 116344 [details]
patch

Andrew fired up his debugger and discovered the source of the problem.  Because of a misconfiguration in the build.properties for the sdk.tests, the scripts that pde build generated for the the sdk.tests were using ant zip instead of the executing the zip included with Linux.  This caused an OOM when building the junit zip for the performance baselines.  Thank you Andrew!
Comment 12 Kim Moir CLA 2008-10-28 17:14:15 EDT
Created attachment 116360 [details]
patch to overcome pde build bug 127747
Comment 13 Kim Moir CLA 2008-10-28 17:15:17 EDT
Notes from Andrew

If you want the rootfiles that were collected (epl-v10.html, notice.html), they are in tmp/eclipse/ANY.ANY.ANY
you could use a customAssembly.xml  pre.archive target to copy them 
from ${eclipse.base}/ANY.ANY.ANY/${collectingFolder} to ${rootFolder}
 ${rootFolder} is just ${eclipse.base}/group.group.group/${collectingFolder}, it is defined by the ant that calls the custom assembly
Comment 14 Kim Moir CLA 2008-10-29 10:04:56 EDT
Ran the baselines last night.

Two problems still
1) The SLED 10 perf machine is not accessible on the network. I opened a bug with IT yesterday. It was on the network before, not sure what happened.  I have rebooted it and restarted the network service to no avail.
2) The database is not accessible to the new perf machines.  It's running on a new port on the database machine which is blocked through the firewall.  I've opened a bug with IT to open this port on the firewall.  I thought the database machine was isolated too so all it's ports would be available to other isolated machines but apparently this is not the case.  If this change takes a long time, I can
-Stop the database for the 3.4.x M builds once this build is complete
-Start the new db on the open port
-Change the 3.5 scripts to point to the open port
-Run the 3.4 baselines
-etc

In the longer term, I should merge all the data into the newer database.
Comment 15 Kim Moir CLA 2008-10-29 12:05:33 EDT
Jenn has fixed eplnx1 network issue
Comment 16 Kim Moir CLA 2008-10-30 16:49:16 EDT
I changed the port of the database server temporarily to get around the firewall issue. I'll run the baselines once the builds are finished with the machines due to 3.4.x and 3.5 M3 builds.
Comment 17 Kim Moir CLA 2008-10-31 17:21:39 EDT
The firewall is still causing problems.  I've installed the database on a perf machine itself to circumvent the firewall.

Comment 18 Kim Moir CLA 2008-11-03 17:15:17 EST
Created attachment 116875 [details]
patch to test.xml
Comment 19 Kim Moir CLA 2008-11-03 17:18:40 EST
Created attachment 116879 [details]
patch to org.eclipse.test.performance 

New driver needed for new database derby-10.4.2.0

org.apache.derby.jdbc.ClientDriver

Also, I have escalated the firewall bug to the IT manager to try to get it resolved more quickly.
Comment 20 Kim Moir CLA 2008-11-04 10:24:48 EST
We have an electrician in our lab today installing new circuits to allow  us to install the perf machines into the UPS.  

Our IT team was able to see the blocked ports on the firewall that prevent the performance machines from contacting the database machine last night.  The rules on the firewall look correct, however, they are blocking access. They are following up with the team in India who actually administers the firewall for further assistance.
Comment 21 Kim Moir CLA 2008-11-04 15:01:28 EST
The baselines are running however, they aren't writing to the temporary database on the perf machine. There was a problem finding the apache derby libraries while running the tests.  I had to upgrade the derby libraries to communicate to the new version of the database.  I've released a fix for this and running the baselines again.  

As an aside, Jenn our sysadmin received electric shocks while installing a new yet defective UPS in our lab.  Earlier in the year, she was cut by a falling server rack.  This is the dedication that we have...
Comment 22 Kim Moir CLA 2008-11-06 11:41:59 EST
The firewall rules were fixed yesterday.

I was having problems writing to the database earlier this week and asked Frederic for help.  Frederic was able to troubleshoot the source of the problem and now I am about to start another test run of the baselines. 

Comment 23 Kim Moir CLA 2008-11-06 12:23:07 EST
Created attachment 117229 [details]
patch
Comment 24 Kim Moir CLA 2008-11-07 18:15:26 EST
The baseline run is still having problems loading the derby libraries to talk to the database.

Frederic, Sonia and I looked at this problem today. I released some more patches and am running another test run.  The libraries work in my workspace but not on the test machine.  If this baseline run doesn't work, I'm going to talk to a Core team member on Monday to work through why the libraries aren't being loaded by the test framework.
Comment 25 Kim Moir CLA 2008-11-11 15:23:25 EST
The performance baselines are working and writing data to the new database.  Yayayayyay!

I will start another performance baseline tonight. With the current baseline, I didn't run perf tests on the windows machine this is machine is also currently used for the JUnit tests for regular builds. I didn't want my performance testing to cause JUnit test delays. I have also released changes to the HEAD stream of the builder to run performance tests once the next baseline run is complete. 
Comment 26 Kim Moir CLA 2008-11-12 09:57:11 EST
The perf baselines look like they completed successfully last night.  I'll release a new builder and enable performance tests for tonight's nightly build and see how it goes.
Comment 27 Kim Moir CLA 2008-11-14 10:04:16 EST
There was a problem with the performance results last night. The version of org.eclipse.test.performance that was released to the maps didn't load the new drivers.  I've fixed this and will run the performance tests with tonight's build again.
Comment 28 Kim Moir CLA 2008-11-17 18:20:50 EST
The performance tests didn't run on the weekend because the compile errors in the build prevented them from running.  Looking forward to tonight's results.
Comment 29 Kim Moir CLA 2008-11-19 08:38:44 EST
I20081118-1720 has performance results in the database.  Frederic is investigating why results weren't generated automatically on the build page.  He will generate them manually for today, we will patch the builder for next time.
Comment 30 Frederic Fusier CLA 2008-11-19 08:52:14 EST
(In reply to comment #29)
> I20081118-1720 has performance results in the database.  Frederic is
> investigating why results weren't generated automatically on the build page. 
> He will generate them manually for today, we will patch the builder for next
> time.
> 
I've opened bug 255785 to track this issue...
Comment 31 Kim Moir CLA 2008-11-21 11:54:38 EST
perf results for the N20081120-2000 build were generated automatically.  Thanks Frederic for all your help.
Comment 32 Frederic Fusier CLA 2008-11-21 13:37:01 EST
(In reply to comment #31)
> perf results for the N20081120-2000 build were generated automatically.  Thanks
> Frederic for all your help.
> 
You're welcome :-)

Note that I opened bug 256156 for the invalid machine names displayed above fingerprints...

Note also that eplnx2 baseline results look definitely odd as we got unexpected regression on some tests (e.g. JDT/Core search tests) which are not reproduced on other boxes... Would it be possible to run a new baseline before next I-build?
Comment 33 Kim Moir CLA 2008-11-24 10:08:37 EST
Frederic, I reran the baselines on Friday.