374441 – run performance tests on eclipse.org hardware

Bug 374441 - run performance tests on eclipse.org hardware

Summary: run performance tests on eclipse.org hardware

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Releng (show other bugs)
Version:	4.2
Hardware:	PC Windows 7

Importance:	P3 normal with 3 votes (vote)
Target Milestone:	4.5 M7
Assignee:	David Williams
QA Contact:

URL:
Whiteboard:
Keywords:	plan

Depends on:	387638 389369 389371 389857 390494 390820 390821 390986 441888 441889 442453 442455 442633 443038 443971 444243 450422 451072 451923 453958 454147 454159 460929
Blocks:	346088 362718 365841
	Show dependency tree

Reported:	2012-03-15 16:19 EDT by Kim Moir
Modified:	2020-11-24 08:45 EST (History)
CC List:	27 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Kim Moir

2012-03-15 16:19:48 EDT

This bug will capture the process to determine if it's possible to run performance tests on eclipse.org hardware.  Requirements are

1.  Stable dedicated hardware to run a performance tests. Previously, Denis had setup a hudson slave to do this. See bug 296290.  Only one executor on the machine so we can measure performance accurately.

2. If hardware is stable, an evaluation will have to be performed to determine if the current performance framework can run on the Hudson servers or a new one should be tried. Also, we'll need a backend database to store the results.

Here's some old documentation on running performance tests which is linked from the platform releng wiki.

http://dev.eclipse.org/viewcvs/viewvc.cgi/org.eclipse.test.performance/doc/Performance%20Tests%20HowTo.html?view=co

Here's some additional documentation on setting up performance tests etc.
http://wiki.eclipse.org/Platform-releng-faq#How_do_I_set_up_performance_tests.3F

I'm writing some additional documentation for the FAQ on how the performance tests are invoked in the build.

Comment 1 John Arthorne

2012-03-15 16:45:42 EDT

Some initial thoughts on this:

I think our best bet is having a new Hudson job that just runs the performance tests. This will be needed because performance tests would need to run on a special designated hudson performance test slave. This will have other benefits, such as making it easy to run or re-run only the performance tests for a given build, and we can configure various triggers to chain the builds together later if desired.

Our build still produces the test harness, and today anyone can download that test harness, download the corresponding SDK build, and invoke the performance tests on their own machine. I think the starting point is a new Hudson job that simply does this. By default performance results will be dumped to the console, but this would be enough to validate whether performance results are stable across multiple runs. 

If the hardware is stable enough, we could take the next step of exploring a backend for storing the performance results. From conversation with Kim and David it sounds like our current backend is very hard to use and maintain, and no other project has successfully been able to use it. There might be other performance test harnesses we could reuse rather than bring our current one back to life.

Just to be clear, none of the current releng committers have any time to work on this, I'm just dumping some thoughts here in case we find someone in the future to look into it.

Comment 2 Kim Moir

2012-03-15 17:51:47 EDT

Added some documentation on how performance tests are invoked in the build.

Comment 3 Kim Moir

2012-03-15 17:52:02 EDT

Added some documentation on how performance tests are invoked in the build.

http://wiki.eclipse.org/Platform-releng-faq#How_performance_tests_are_invoked_in_the_build

Comment 4 John Arthorne

2012-03-16 10:06:50 EDT

Thanks Kim!

Comment 5 Denis Roy

2012-03-16 16:05:38 EDT

I guess I should keep this slave kicking around for a while, then?

https://hudson.eclipse.org/hudson/computer/hudson-perf1-tests/

Comment 6 Dani Megert

2012-03-19 08:05:51 EDT

Satyam, please start looking into this as discussed. If needed, talk to Kim directly to get up to speed.

Comment 7 Satyam Kandula

2012-03-20 10:09:38 EDT

(In reply to comment #6)
> Satyam, please start looking into this as discussed. If needed, talk to Kim
> directly to get up to speed.
I started trying to understand the setup with the help of Kim.

Comment 8 Satyam Kandula

2012-03-21 10:48:56 EDT

Started playing around with the job https://hudson.eclipse.org/hudson/view/Eclipse%20and%20Equinox/job/eclipse-sdk-perf-test/

Comment 9 Satyam Kandula

2012-03-22 02:42:38 EDT

(In reply to comment #5)
> I guess I should keep this slave kicking around for a while, then?
> 
> https://hudson.eclipse.org/hudson/computer/hudson-perf1-tests/
Denis, I am having problems running the command 'cvs' on this machine. Can you please install or setup this? Thanks in advance.

Comment 10 Denis Roy

2012-03-22 14:38:41 EDT

I've installed cvs and git on the perf1-tests slave, and did OS patches at the same time.

Comment 11 Satyam Kandula

2012-03-27 08:29:15 EDT

(In reply to comment #10)
> I've installed cvs and git on the perf1-tests slave, and did OS patches at the
> same time.
Thanks for this. 

I could see only 64-bit JVM on this machine. I want to use a 32-bit VM and I couldn't find it. Can you tell me the path if there is already one or otherwise can you please set it up. I need an Oracle Java 6 32-bit VM. Thanks in advance.

Comment 12 David Williams

2012-04-24 15:24:57 EDT

Its my understanding "we" may not "get back" to getting the performance tests running for a few months, after Juno? Unless someone, Satyam? Dani?, says otherwise, I think that "dedicated slave" 
hudson-perf1-tests
could be put to better use as a "normal" slave. 

I know we are coming up on "the busy season" for builds, and as I've been trying to get Eclipse unit tests running on hudson, I've noticed there seems to often be a "shortage" of available x86_64 bit machines?

Comment 13 Satyam Kandula

2012-04-25 08:21:18 EDT

(In reply to comment #12)
> Its my understanding "we" may not "get back" to getting the performance tests
> running for a few months, after Juno? Unless someone, Satyam? Dani?, says
> otherwise, I think that "dedicated slave" 
> hudson-perf1-tests
> could be put to better use as a "normal" slave. 
> 
> I know we are coming up on "the busy season" for builds, and as I've been
> trying to get Eclipse unit tests running on hudson, I've noticed there seems to
> often be a "shortage" of available x86_64 bit machines?
Though I am not actively using this as of now, I plan to get started on this from next week. Atleast initially I don't need to have a dedicated machine, but eventually we might have to. We can probably time share it for now. It will help if I can get some time during your night time (day-time for India).

Comment 14 Denis Roy

2012-04-25 15:54:02 EDT

I'll leave it as-is.  I've added a hudson-slave2 with 2 executor threads to soak up the additional load we've been seeing lately.

Comment 15 Mike Milinkovich

2012-09-06 11:38:43 EDT

(In reply to comment #12)
> I know we are coming up on "the busy season" for builds, and as I've been
> trying to get Eclipse unit tests running on hudson, I've noticed there seems
> to often be a "shortage" of available x86_64 bit machines?

If hardware is at issue here, can you please be specific about what machines the Eclipse Foundation could acquire to alleviate the problem? How many machines do we need, what architectures, what operating systems?

In the meantime, would it not make sense to move forward getting everything working on the vserver already available? That way if dedicated machines arrive, we can start using them as soon as they are provisioned.

Comment 16 John Arthorne

2012-09-06 13:52:45 EDT

(In reply to comment #15)
> If hardware is at issue here, can you please be specific about what machines
> the Eclipse Foundation could acquire to alleviate the problem? How many
> machines do we need, what architectures, what operating systems?

I don't think we got to the point where we verified whether hardware was an issue. The top todos are well captured in comment #1.

Comment 17 John Arthorne

2012-09-13 09:49:23 EDT

Denis, is virtualization a possibility for Windows, or would we need dedicated machines there?

Comment 18 Denis Roy

2012-09-13 09:56:38 EDT

(In reply to comment #17)
> Denis, is virtualization a possibility for Windows, or would we need
> dedicated machines there?

The current Windows slave is virtualized.

Comment 19 John Arthorne

2012-09-13 10:08:39 EDT

I was unclear, I meant the dedicated CPU thing you did for hudson-perf1-tests. If so can you create an equivalent windows perf tests slave so we can try running the performance tests there too.

Comment 20 David Williams

2012-09-13 11:22:46 EDT

Denis, can you tell us which other "virtual slaves" are in common with the current linux "perf1" machine? I.e. running on same hardware. As we run some sample tests (bug 389369) I'd like to make sure that other virtual machine is sometimes in use, and sometimes not (e.g. we might re-run some of our N build tests there, or something, at varying intervals).

Comment 21 Denis Roy

2012-09-13 11:31:05 EDT

> I was unclear, I meant the dedicated CPU thing you did for
> hudson-perf1-tests. 

Yep, I can pin specific CPU cores to avy VM.


(In reply to comment #20)
> Denis, can you tell us which other "virtual slaves" are in common with the
> current linux "perf1" machine? 

I'd really like to not tell you, since I'd really like for it to not matter.  You will see variances in disk performance, but for CPU (and memory) what's happening elsewhere shouldn't really matter.

In other words, if you feel the need to stop other jobs in order to run performance tests, than we've failed with virtualized environments.

Comment 22 David Williams

2012-09-13 11:52:23 EDT

(In reply to comment #21)
> 
> I'd really like to not tell you, since I'd really like for it to not matter.
> You will see variances in disk performance, but for CPU (and memory) what's
> happening elsewhere shouldn't really matter.
> 
> In other words, if you feel the need to stop other jobs in order to run
> performance tests, than we've failed with virtualized environments.

Well, I just wanted to make sure we got some good "test runs" (out tests of the tests process). Such as to see if the "variances in disk performance" caused our results to vary. I'm a little concerned we are doing the tests now during a relative "quite period", so all might look great, but then in "real runs", over months, some performance tests might show degraded performance, not because anything we did .... but, just because those particular tests were disk intensive and some other jobs were running at the same time that were also disk intensive. I've not even thought about stopping other jobs. Not sure what you mean. Are we talking about the same thing? 
Maybe its no big deal or maybe just "untestable" micro management ... but ... I was just trying to make sure we had representative conditions. Not just a few days of "ideal conditions".

Comment 23 David Williams

2012-09-13 11:53:52 EDT

(In reply to comment #22)
> (In reply to comment #21)
> > 
> out tests of the tests process ==> 
  our tests of the tests process

Comment 24 John Arthorne

2012-09-17 10:18:27 EDT

My conclusion from bug 389369 is that the linux perf slave gives stable enough results. It will need more RAM to be able to run all our tests, and might need a second slave if it was running our full suite every day, especially if other projects start using the hudson perf infrastructure (there was interest in this at the last Arch Council meeting). I think the next step is running the same experiment on a Windows slave node to make sure that is also stable.

Comment 25 Denis Roy

2012-09-17 14:16:26 EDT

> It will need more RAM to be able to run all our tests

I've trimmed 2G from the master (16G to 14G) an 1G from the Windows slave (5G to 4G) so next restart perf1 will have 4G of RAM.


> might need a second slave

We were waiting for the outcome of these experiments to determine what type of hardware we'd get.  Now that we know virtualized Linux instances are adequate, we can aquire more.


> next step is running the same experiment on a Windows slave node to make
> sure that is also stable.

If you ever want to experiment with Windows perf tests, please coordinate with me.  I'll pin one (or two) CPU cores to the Windows slave and reduce Hudson executors to 1, which will give you a suitably isolated environment for testing.

Comment 26 David Williams

2012-09-17 23:03:28 EDT

> 
> If you ever want to experiment with Windows perf tests, please coordinate
> with me.  I'll pin one (or two) CPU cores to the Windows slave and reduce
> Hudson executors to 1, which will give you a suitably isolated environment
> for testing.

Can we tentatively plan on Friday 9/21 to Wednesday 9/26? That should give me time to try it, see if I can getting it working at all, and assuming so, get in 3 to 9 runs. If that's acceptable to you, I'd like to send a note to cross-project list to make sure we won't interfere with anyone else's Kepler M2. (I picked those dates because we in Eclipse Project should be done with our Kepler M2, but everyone else will still be working on it for 2 weeks, until 10/5.)  

Thanks,

Comment 27 Denis Roy

2012-09-18 09:56:49 EDT

No problems for either of those dates.

Comment 28 David Williams

2012-09-18 11:57:50 EDT

(In reply to comment #27)
> No problems for either of those dates.

Thanks Denis. I've sent note to cross-project list, but as I looked at it, there were only 4 jobs that ever used it, none of them seemed to take too long ... well ... of course except for our Eclipse project unit tests ... and I'll just disable those tests for the platform during that time ... doubt anyone will miss them ... or, at least they'd agree with the trade off. 

Over next day or three, I'll try some small perf jobs there ... just to make sure I can set it up and there's no "prereqs" missing. Eventually, we might need that persistent RDC session mentioned in bug 369873 but not sure ... and willing to try without it.

Comment 29 John Arthorne

2012-10-01 09:18:00 EDT

I'm splitting the performance testing work into more small bugzillas, with the hope that people with a small amount of time to help can chip in. 

Updating documentation: bug 390820

Setting up back end to store results: bug 390821

Comment 30 Matthias Mailänder

2014-03-15 18:40:07 EDT

Hi, I want to help out here. This is also part of my ongoing quest to get started with https://wiki.eclipse.org/Google_Summer_of_Code_2014_Ideas#Performance_work_in_Eclipse

Already took on https://bugs.eclipse.org/bugs/show_bug.cgi?id=390820

Comment 31 Matthias Mailänder

2014-05-15 13:44:28 EDT

See also https://bugs.eclipse.org/bugs/show_bug.cgi?id=389834 for the plan to run this on a dedicated box to avoid interference by other busy VMs. https://hudson.eclipse.org/perftests/ has just been rebooted.

Comment 32 Dani Megert

2014-08-11 09:48:07 EDT

This plan item got deferred from 4.4 to 4.5.

Comment 33 Mike Milinkovich

2014-08-11 10:34:33 EDT

(In reply to Dani Megert from comment #32)
> This plan item got deferred from 4.4 to 4.5.

4.5 is the Mars release, right? (E.g. the current release cycle.(

What milestone is targeted? Is there anything we can do to help from our end at the EMO?

Comment 34 David Williams

2014-08-11 10:43:18 EDT

(In reply to Mike Milinkovich from comment #33)
> (In reply to Dani Megert from comment #32)
> > This plan item got deferred from 4.4 to 4.5.
> 
> 4.5 is the Mars release, right? (E.g. the current release cycle.(
> 
> What milestone is targeted? Is there anything we can do to help from our end
> at the EMO?

I don't want to set expectations yet since I'm still investigating. 

Eventually we'll need some help hooking into a database (or, "having our own", on build.eclipse.org ... which I'd want help/review with too) but, I am not at the point of knowing exactly what is needed so best I can say is "stay tuned".

Comment 35 Mike Milinkovich

2014-08-11 10:49:13 EDT

(In reply to David Williams from comment #34)
> Eventually we'll need some help hooking into a database (or, "having our
> own", on build.eclipse.org ... which I'd want help/review with too) but, I
> am not at the point of knowing exactly what is needed so best I can say is
> "stay tuned".

Thanks David. I consider this a very high priority item for the community, so please don't be shy in asking us for help. 

It does take time to buy and install dedicated hardware, so if that is going to be required, the earlier we can get started the better.

Comment 36 Denis Roy

2014-08-11 11:20:16 EDT

> Thanks David. I consider this a very high priority item for the community,
> so please don't be shy in asking us for help. 

+1  We're here to help.

Comment 37 Dani Megert

2014-08-11 13:31:21 EDT

(In reply to Mike Milinkovich from comment #35)
> (In reply to David Williams from comment #34)
> > Eventually we'll need some help hooking into a database (or, "having our
> > own", on build.eclipse.org ... which I'd want help/review with too) but, I
> > am not at the point of knowing exactly what is needed so best I can say is
> > "stay tuned".
> 
> Thanks David. I consider this a very high priority item for the community,
> so please don't be shy in asking us for help. 
> 
> It does take time to buy and install dedicated hardware, so if that is going
> to be required, the earlier we can get started the better.

(In reply to Denis Roy from comment #36)
> > Thanks David. I consider this a very high priority item for the community,
> > so please don't be shy in asking us for help. 
> 
> +1  We're here to help.

Thanks guys!

Comment 38 Matthias Mailänder

2014-08-12 01:25:35 EDT

I am currently so annoyed by an Eclipse RCP applications with terrible performance I am forced to use at work that my motivation to help you get this going in my free time has risen above the pain threshold.

The plan is to port my existing https://build.vogella.com/ci/view/Performance/job/C-MASTER-TESTING-org.eclipse.test.performance/ to the Eclipse.org infrastructure. Where can I find the following files (or equivalents)

* eclipse.platform.releng.aggregator/workspace/eclipse.platform.releng.tychoeclipsebuilder/eclipse-junit-tests/target/eclipse-junit-tests-bundle.zip

* eclipse.platform.releng.aggregator/workspace/eclipse.platform.releng.tychoeclipsebuilder/sdk/target/products/org.eclipse.sdk.ide-linux.gtk.x86_64.tar.gz eclipse-testing/

As you can see I copied them from another freshly built Jenkins job. In this case the vogella version of the eclipse.platform.releng.aggregator master build used to create voclipse. Can I grab those from https://hudson.eclipse.org/hudson/ or is it not possible to access the workspaces?

Comment 39 David Williams

2014-08-12 03:27:46 EDT

(In reply to Matthias Mailänder from comment #38)
> I am currently so annoyed by an Eclipse RCP applications with terrible
> performance I am forced to use at work that my motivation to help you get
> this going in my free time has risen above the pain threshold.
> 

Sorry your frustrated, and always glad to have help. But, a word of advise: 

If you have a specific application that has performance problems, my guess is you would solve that problem much faster if you profiled your application and found where the performance problems were, and fixed those, or opened bugs where you found problem in platform code. 

Perhaps its a common mis-conception, so I'll explain, our "performance tests" are "regression tests" ... by themselves, they do not find how to fix the performance problems ... they are meant to spot cases where performance changes from build to build (or milestone to milestone, or release to release). They work best when looking at results from build to build. The longer the time between one measurement-build and another measurement-build (such as comparing one release to another) is very difficult to find the actual reason for any regressions, since so much as changed, the test simply provides a "use case" to begin looking, profiling, etc. Where as if done "week to week", then there is a better idea of what's changed from one week to the next. So, it sounds like you already have some "use-cases" where performance is slow ... you do not need a test to find those. And now you are ready to start find the reason. (I know, I'm over simplifying what you said, and how the regression tests might help you narrow it down ... but I hope that helps get my point across.) 

I'll make another point that will at first seem rude ... but I mean it constructively. The files you  ask about are really quite basic. If you need to ask where they are ... I fear "educating you" will be a very time consuming task. And this already following several weeks of working with you earlier in the summer ... so while you obviously "got something running", I would question if it "fits in" with the platform's plans. So my advise is to try to make more progress on your own, for those basic things, and you'll be in a better position to contribute something substantial, rather than a prototype that won't be that useful in the long run. Again, I mean all that constructively ... itself a way of helping you :) I hope you find it constructive.

Comment 40 Matthias Mailänder

2014-08-12 14:44:54 EDT

Sadly I don't have source code access to the said application. It has multiple performance problems including database searches that can't be canceled and can take several minutes which can be caused by too few server resources on company side and weird design choices by the vendor. On the other hand switching perspectives which is required all the time takes quite long. My guess is that could be RCP related, but even that is not for sure as a strange Oracle forms to RCP 3.7 wrapper is used. The workaround of my colleagues is to simply open up several instance at once so there is always the right perspective open in another window which well kinda makes things worse when the PC hits the RAM limit at the end of the day. I was also astonished when an error in a form created an uncatched NullPointException that took down the whole application. Flaws in the individual RCP based application instead of the framework may also be highly likely. While I highlight the problems of the software here, I have to say that the overall usability and stability is excellent in most everyday operations. I used several alternative solutions from other vendors and this Java based one is by far the best although there is always room for improvement and I still think I should give back here as I don't donate money to the Eclipse project.

I get your point that walking me through everything is more time consuming than doing it yourself. A small hint on how to resolve the actual problem would have been more constructive though. ;) I assume you can get the tarballs I need via the download site instead of directly via Hudson. A cp would have been faster than a wget though, so that is why I asked where to get them from the internal build server system which is a little more complex than the small vogella one.

Comment 41 David Williams

2014-08-21 02:51:41 EDT

[Note: I have removed the 'helpwanted' tag from this bug, for the simple reason that 'helpwanted' means "we won't work on it, but would appreciate someone taking ownership of the bug" ... which hasn't happened yet, with this bug :) ... but help is always appreciated if someone sees a "piece" of something they would be willing to own, just say so.]

I have investigated doing this work enough, now, to provide this rough outline of my view of our objectives and a brief, simplified series of steps to get there. Much of this is covered by all the "dependent" bugs, but I think a little prose helps communication and understanding. Comments and feedback welcome, especially of the form of how to do less work :) ... but anyone involved with initial "performance work" (started 8 years ago?) might clear up any misconceptions I have.

Objectives/outline

Over-all:

Have performance tests that can run "in a few hours" ... at least not longer than regular unit tests.

This is so results are available quickly, can be ran with "every build" ... at least on one machine.
If there is a need, we can eventually have longer running tests that run, say, in between I-builds, or milestones, but probably best to treat those as small, "separate jobs" that runs tests in pieces (just for the practical reason that if something goes wrong with one suite, the other suites still provide data). [And, these long-running tests is long term goal, after everything else on this list.]

Run performance tests on (at least) three platforms, Windows (dedicated), Linux, Mac OSX. [This is partially because sometimes it is interesting, and important, to see differences between platforms, but also to see if we really need dedicated machines, or if statistics can do what they are supposed to do with variance.]

Make it (relatively) easy to run the tests "on any machine", not just "eclipse.org" hardware.

Steps:

1: Just get the tests running, producing the "raw output" to console.
Main purpose here is to see which tests still run, which don't. Will ruthlessly cut tests with any sign of trouble.
Currently there are 41 test suites listed as having "performance target" (See list below). From early prototyping (a couple of years ago) about a third of them ran, a third ran but took more than 2 hours each, and a third did not seem to run any longer at all (hang or crash) -- never investigated reason why and I myself won't now (component/ committers will need to investigate, if they desire, and are able).
See list below for current suites that are marked with a "performance target". If anyone knows apriori of any that should be removed as a "performance tests", please say.

2: (In parallel, ongoing) Provide ability to run (one) performance test suite easily "on committers machine".

This is partially so committers can debug the tests themselves, especially if not working, or if there appears to be a regression.
May also prove handy to explore the effects of a proposed change, before it's committed, in a few, relatively rare cases. (i.e. could lead to a false sense of security, unless a test is specifically designed for the change being made).

3: (In parallel, ongoing) Fix up current documentation on running performance tests, so it's accurate, and can be useful to others.

4: Collect data to a database.

Initial plan is to set up "Derby" on build machine to collect results from production runs (from all machines running performance tests).
Longer term may change to another, more "enterprise level" data base that is backed up, etc. by Eclipse Foundation.

5: Get "performance.ui" bundle building and working, to produce "finger print" graphs [and other statistical summaries?], from the database data.

6: Understand the requirements and update the PHP code to get "results" displaying on build-download pages.

7. Set up tests so our week-to-week comparisons are against "most recent release" as reference (probably 4.4.1). [As I understand it, if set up correctly, "current build" and "reference build" are both ran automatically each time.]

This is for sole purpose of spotting regressions introduced from one week to the next.
If there are regressions, should be relatively easy to narrow down reasons (and fix), since not that much code changes, week to week.
[FWIW, this week to week regression testing is the original, primary design goal of current performance tests and framework, as I understand it.]

8. Once above is all working well, try some occasional tests (such with milestones only?) that compare current performance with long past releases, such as 3.8.

This will be interesting, but will have limited "debug" value since if performance regressions are seen, there have been so many code and architectural changes, it would be normal "performance debugging" that would be required to improve, at this point in time.
But, such comparisons may provide some priorities, such as which areas to focus on -- for anyone who has ability and time to work on performance related bugs.

= = = = = = = = = = = = =

Full list of test suites with performance target:
[Please leave a note, if any component lead already knows that a test
should be removed.]

org.eclipse.ant.tests.core
org.eclipse.ant.tests.ui
org.eclipse.compare.tests
org.eclipse.core.expressions.tests
org.eclipse.core.filebuffers.tests
org.eclipse.core.tests.net
org.eclipse.core.tests.resources
org.eclipse.core.tests.runtime
org.eclipse.equinox.p2.tests.ui
org.eclipse.jdt.apt.pluggable.tests
org.eclipse.jdt.apt.tests
org.eclipse.jdt.compiler.apt.tests
org.eclipse.jdt.compiler.tool.tests
org.eclipse.jdt.core.tests.builder
org.eclipse.jdt.core.tests.compiler
org.eclipse.jdt.core.tests.model
org.eclipse.jdt.core.tests.performance
org.eclipse.jdt.debug.tests
org.eclipse.jdt.text.tests
org.eclipse.jdt.ui.tests
org.eclipse.jdt.ui.tests.refactoring
org.eclipse.jface.text.tests
org.eclipse.ltk.core.refactoring.tests
org.eclipse.ltk.ui.refactoring.tests
org.eclipse.osgi.tests
org.eclipse.pde.api.tools.tests
org.eclipse.pde.build.tests
org.eclipse.pde.ds.tests
org.eclipse.pde.ui.tests
org.eclipse.search.tests
org.eclipse.swt.tests
org.eclipse.team.tests.core
org.eclipse.team.tests.cvs.core
org.eclipse.text.tests
org.eclipse.ua.tests
org.eclipse.ua.tests.doc
org.eclipse.ui.editors.tests
org.eclipse.ui.tests.forms
org.eclipse.ui.tests.performance
org.eclipse.ui.tests.rcp
org.eclipse.ui.workbench.texteditor.tests

Comment 42 David Williams

2014-08-21 03:51:25 EDT

(In reply to David Williams from comment #41)

> Run performance tests on (at least) three platforms, Windows (dedicated), Linux, Mac OSX

I see a little mistake already. I guess it is a Linux machine, that is dedicated at 

https://hudson.eclipse.org/perftests/

Which is fine (as long as it is really dedicated, which means "no other jobs running" and "no changes made" (which, I know, changes do have to be made occasionally to fix issues, etc., but please keep us informed, as you are able, just so we don't waste time investigating something that might be traceable to new version of Java, or something). 

The current Windows machine is sort of dedicated, in that its configured so that only one job can run at a time .... you may have to tell us if we end up hogging it too much, but our initial goal is to keep these performance tests short and sweet.

Comment 43 David Williams

2014-08-21 12:30:06 EDT

For the record, I've changed the Hudson configuration of "the performance machine" -- I removed all the old versions of Ant, and defined Ant for 1.8.4 and 1.9.2 (1.9.2 being the default). 

I also removed, from the Hudson configuration, all the old versions of Java, and defined Java 7 latests (which is hardly latests, any longer, but is still at u51. 
I also defined Java 8 there, again as "latest" as defined on "shared". 

Which reminds me, it appears /shared/ is not defined as an alias to /opt/public on that machine. Would be better if it was, for consistency of my scripts. 

Can a webmaster do that, please? 
(pretty sure it is not possible from Hudson, or with my privileges.

Also noticed it is still at the Hudson 3.1.2 level, though a Hudson 3.2.0 is now available. I know of no reason to update, but will want to eventually, unless anyone knows of a reason not to.

Comment 44 David Williams

2014-08-21 12:44:18 EDT

(In reply to David Williams from comment #43)

> Which reminds me, it appears /shared/ is not defined as an alias to
> /opt/public on that machine. Would be better if it was, for consistency of
> my scripts. 
> 
> Can a webmaster do that, please? 
> (pretty sure it is not possible from Hudson, or with my privileges.
> 

I see now, where is a /shared defined, but under that, only 

common is defined to point to /opt/public/common

It would suffice for my purposes to if 

eclipse was included, to point to /opt/public/eclipse 

As far as I know, the security on 'eclipse' directories is correct and appropriate, but let me know if that's not the case. 

But, I won't be able to run my jobs without this symbolic link ... and I don't want to have two versions of them. Guess I could change them all to /opt/public ... but then that'd no longer match some of my test machines! 

Advice welcome, if you can't change it, safely.

Comment 45 David Williams

2014-08-28 11:11:56 EDT

(In reply to David Williams from comment #41)
 
I'll updated objectives a little (below, inline), partially from discussions with Dani, partially from what I've learned in some early runs, and partially from giving it more thought ... no radical changes, just fine tuning.  

> 
> Objectives/outline
> 
> Over-all: 
> 
> Have performance tests that can run "in a few hours" ... at least not longer
> than regular unit tests. 
>  
>   This is so results are available quickly, can be ran with "every build"
> ... at least on one machine. 

= = = = = = = 
The need to run with "every build" is debatable. My thinking was that "lots of short runs" would lead to better statistical "baselines" for trend analysis or multivariate analysis, but we are so for from being able to do "new" statistics, that the priority is not as high as being listed first in objectives make it appear. 
= = = = = = = 

>   If there is a need, we can eventually have longer running tests that run,
> say, in between I-builds, or milestones, but probably best to treat those as
> small, "separate jobs" that runs tests in pieces (just for the practical
> reason that if something goes wrong with one suite, the other suites still
> provide data). [And, these long-running tests is long term goal, after
> everything else on this list.]
>   
> Run performance tests on (at least) three platforms, Windows (dedicated),
> Linux, Mac OSX. [This is partially because sometimes it is interesting, and
> important, to see differences between platforms, but also to see if we
> really need dedicated machines, or if statistics can do what they are
> supposed to do with variance.]
> 
> Make it (relatively) easy to run the tests "on any machine", not just
> "eclipse.org" hardware. 
> 

= = = = = = = =
The focus will be on getting "the performance test system" working on one platform first -- emphasis on the "end-to-end" aspects, and then later running on other machines. This is mostly because "running the tests" is actually a pretty small part of having a useful "performance testing system" that can retain data, do statistics against baselines, produce meaningful summaries, etc. 

** The most substantial issue, in objectives, that might require some team discussion -- as I've seen some tests fail, I've begun to wonder how many of the original tests can literally run "as is" on both 4.x stream and older ones, such as 3.x streams. So one thing I'll ask the committers to do, is if they have -- or add -- a performance test that is good for spotting regressions on 4.x stream as we move forward, but that it is known that it will not run on as-is on earlier, especially 3.x streams, then it should go into a separate "test suite" so it's known which tests to run on which streams and baselines. Put another way, this increases priority of trying to run tests on multiple streams, just to make sure we know the "common base" of tests that run on both streams. I mean this in the sense of being sure we compare apples to apples in our tests results and summaries. Fairly easy to do in week to week summaries, but pretty hard to do in comparing tests ran on today's code, versus running tests on code from years ago. 
= = = = = = = =

> Steps: 
> 
> 1: Just get the tests running, producing the "raw output" to console. 
>   Main purpose here is to see which tests still run, which don't. Will
> ruthlessly cut tests with any sign of trouble.  
>   Currently there are 41 test suites listed as having "performance target"
> (See list below). From early prototyping (a couple of years ago) about a
> third of them ran, a third ran but took more than 2 hours each, and a third
> did not seem to run any longer at all (hang or crash) -- never investigated
> reason why and I myself won't now (component/ committers will need to
> investigate, if they desire, and are able). 
>   See list below for current suites that are marked with a "performance
> target". If anyone knows apriori of any that should be removed as a
> "performance tests", please say. 
>   
> 2: (In parallel, ongoing) Provide ability to run (one) performance test
> suite easily "on committers machine". 
> 
>    This is partially so committers can debug the tests themselves,
> especially if not working, or if there appears to be a regression. 
>    May also prove handy to explore the effects of a proposed change, before
> it's committed, in a few, relatively rare cases. (i.e. could lead to a false
> sense of security, unless a test is specifically designed for the change
> being made).   
>   
> 3: (In parallel, ongoing) Fix up current documentation on running
> performance tests, so it's accurate, and can be useful to others. 
>   
> 4: Collect data to a database. 
> 
>   Initial plan is to set up "Derby" on build machine to collect results from
> production runs (from all machines running performance tests). 
>   Longer term may change to another, more "enterprise level" data base that
> is backed up, etc. by Eclipse Foundation.  
>   
> 5: Get "performance.ui" bundle building and working, to produce "finger
> print" graphs [and other statistical summaries?], from the database data.
> 
> 6: Understand the requirements and update the PHP code to get "results"
> displaying on build-download pages. 
>   
> 7. Set up tests so our week-to-week comparisons are against "most recent
> release" as reference (probably 4.4.1). [As I understand it, if set up
> correctly, "current build" and "reference build" are both ran automatically
> each time.]
> 
>    This is for sole purpose of spotting regressions introduced from one week
> to the next. 
>    If there are regressions, should be relatively easy to narrow down
> reasons (and fix), since not that much code changes, week to week. 
>    [FWIW, this week to week regression testing is the original, primary
> design goal of current performance tests and framework, as I understand it.]
>   
> 8. Once above is all working well, try some occasional tests (such with
> milestones only?) that compare current performance with long past releases,
> such as 3.8. 
> 
>   This will be interesting, but will have limited "debug" value since if
> performance regressions are seen, there have been so many code and
> architectural changes, it would be normal "performance debugging" that would
> be required to improve, at this point in time. 
>   But, such comparisons may provide some priorities, such as which areas to
> focus on -- for anyone who has ability and time to work on performance
> related bugs. 
>   
>   
>   = = = = = = = = = = = = = 
>   
>   Full list of test suites with performance target: 
>   [Please leave a note, if any component lead already knows that a test 
>    should be removed.] 
>   

= = = = = = = = = = = 
It turns out this list is about twice as long as it should be, since 
many of these test bundles had "empty" performance targets. I've asked 
committers to remove those empty targets (see bug 442455) and current 
list of known performance bundles are listed in bug 442455 comment 4. 
= = = = = = = = = = =

> org.eclipse.ant.tests.core
> org.eclipse.ant.tests.ui
> org.eclipse.compare.tests
> org.eclipse.core.expressions.tests
> org.eclipse.core.filebuffers.tests
> org.eclipse.core.tests.net
> org.eclipse.core.tests.resources
> org.eclipse.core.tests.runtime
> org.eclipse.equinox.p2.tests.ui
> org.eclipse.jdt.apt.pluggable.tests
> org.eclipse.jdt.apt.tests
> org.eclipse.jdt.compiler.apt.tests
> org.eclipse.jdt.compiler.tool.tests
> org.eclipse.jdt.core.tests.builder
> org.eclipse.jdt.core.tests.compiler
> org.eclipse.jdt.core.tests.model
> org.eclipse.jdt.core.tests.performance
> org.eclipse.jdt.debug.tests
> org.eclipse.jdt.text.tests
> org.eclipse.jdt.ui.tests
> org.eclipse.jdt.ui.tests.refactoring
> org.eclipse.jface.text.tests
> org.eclipse.ltk.core.refactoring.tests
> org.eclipse.ltk.ui.refactoring.tests
> org.eclipse.osgi.tests
> org.eclipse.pde.api.tools.tests
> org.eclipse.pde.build.tests
> org.eclipse.pde.ds.tests
> org.eclipse.pde.ui.tests
> org.eclipse.search.tests
> org.eclipse.swt.tests
> org.eclipse.team.tests.core
> org.eclipse.team.tests.cvs.core
> org.eclipse.text.tests
> org.eclipse.ua.tests
> org.eclipse.ua.tests.doc
> org.eclipse.ui.editors.tests
> org.eclipse.ui.tests.forms
> org.eclipse.ui.tests.performance
> org.eclipse.ui.tests.rcp
> org.eclipse.ui.workbench.texteditor.tests

Comment 46 Matthias Mailänder

2014-09-03 02:38:50 EDT

Sadly https://build.vogella.com/ci/ crashed and my login as well as my job configurations are gone. https://github.com/Mailaender/Eclipse-Performance-GSoC2014/issues/6#issuecomment-44574060 is my last backup of the quite hacky build scripts. I saw that https://hudson.eclipse.org/perftests/job/ep45I-perf-lin64/ is essentially running thanks to David Williams and probably doing things much cleaner. I am withdrawing here. Sorry for being not much help.

Comment 47 David Williams

2014-09-03 02:54:05 EDT

(In reply to Matthias Mailänder from comment #46)
> Sadly https://build.vogella.com/ci/ crashed and my login as well as my job
> configurations are gone.
> https://github.com/Mailaender/Eclipse-Performance-GSoC2014/issues/
> 6#issuecomment-44574060 is my last backup of the quite hacky build scripts.
> I saw that https://hudson.eclipse.org/perftests/job/ep45I-perf-lin64/ is
> essentially running thanks to David Williams and probably doing things much
> cleaner. I am withdrawing here. Sorry for being not much help.

Thanks for trying Matt. Just having you to work with was help by itself. 
Good luck in your new endeavors ... keep us posted of "use of Eclipse in Science" ... I found your blog/article very interesting.

Comment 48 David Williams

2014-09-03 03:02:05 EDT

Caution: rough notes ahead that won't make sense in isolation: 

This is some "background and education" about our current status, I plan to discuss with committers in our status meeting, and 
thought I'd post here, to give a flavor of where we are at. 

[and, no one should get excited yet ... to say we have "performance tests running", in this sense, is only about 10%
of what needs to be done to have a "performance monitoring system" in place, one that is useful to committers and 
community. ]

= = = = = = = = = =  

This is link to our "dedicated performance machine": 
https://hudson.eclipse.org/perftests/view/Eclipse%20and%20Equinox/

Luckily, there is only 19 junit suites, not 41 (22 had empty targets)

To "see" anything, committers will, for now, need to poke around Hudson workspace and/or "build artifacts" ... that is, nothing is being summarized or "made pretty" at this point. 

= = = = = = = = job 15 = = = = =

This test took about 10 hours total -- may have included a couple of hangs or "2 hour timeouts"

https://hudson.eclipse.org/perftests/job/ep45I-perf-lin64/15/artifact/workarea/I20140826-0800/eclipse-testing/results/html/


org.eclipse.ant.tests.ui                7 tests. 3 "IllegalMonitor" Errors. 10 miniutes. 
                                        Can the failing tests be fixed in a way that is compatible 
                                        with 4.4, and 3.8? (If not, let's remove them? -- ie. this is an "in general question"!

org.eclipse.compare.tests               1 test 1.5 seconds. sound right? worth it? or remove it. [through out, when I say "remove", 
                                        I mean "remove for now" ... not necessarily all time ... to goal being to achieve focus].

org.eclipse.core.tests.resources        seems solid? 

org.eclipse.core.tests.runtime          seems solid? (a few errors)

org.eclipse.equinox.p2.tests.ui         "ran", but no results of any kind. Setup problem? -- will remove for now.

org.eclipse.jdt.core.tests.performance  66 tests all ran. But, took > 1 hour. Remove for now. 
                                        Is there a chance this "long test" can be broken up into say 15 mintues 
                                        of "the most important, tell-tale tests" ... and others 
                                        put in a "long running bucket"? 

org.eclipse.jdt.debug.tests             18 tests. ~ 20 min.

org.eclipse.jdt.text.tests              118 tests > 1 hour several failures. Plan to remove, for now. 

org.eclipse.jdt.ui.tests                28 tests ~ 20 minutes

org.eclipse.jdt.ui.tests.refactoring    49 tests - 1 failure > 1 hour . Plan to remove, for now. 

org.eclipse.osgi.tests                  10 tests  1 minute? Can such short running tests be valid? Or should we remove for now? 

org.eclipse.pde.api.tools.tests         "ran" but no results of any kind. I assume setup problem? -- will remove for now. 

org.eclipse.pde.ui.tests                7 tests 5 minutes

org.eclipse.swt.tests                   7 tests ~ 10 minutes

org.eclipse.team.tests.cvs.core         2 tests  ~15 minutes --- ran ok, but, cvs? .... can we skip these for now? To improve focus? 

org.eclipse.ua.tests                    7 tests  ~ 2 minutes

org.eclipse.ui.tests.forms              1 test  30 seconds?  --- should we skip? should we skip for now. Focus elsewhere?

org.eclipse.ui.tests.performance       68 tests 2 errors --- ~ 30 minutes

org.eclipse.ui.tests.rcp                4 tests         ~ 2 minutes

= = = = = = = = = Job 17 = = = = 

Ran with "selectPerfomance tests" (leaves out the 6 or 8 problematic, or longest running ones)

https://hudson.eclipse.org/perftests/view/Eclipse%20and%20Equinox/job/ep45I-perf-lin64/17/

This job took about 2.5 hours. 

Besides "HTML" results, (as above) the "text results" where times, memory usage, etc. is dumped is in the .txt files, in 
https://hudson.eclipse.org/perftests/view/Eclipse%20and%20Equinox/job/ep45I-perf-lin64/17/artifact/workarea/I20140826-0800/eclipse-testing/results/linux.gtk.x86_64_8.0/

= = = = = = = = = = = = = = = = = 

Fundamental question: 
- - - - - - -
Is next priority collecting data in database? 

Or, getting tests to run on multiple platforms? 

At first I thought the former, just to be sure mechanics were in place, but after seeing some of the failures, began to wonder if all tests are compatible with all our "target versions" of Eclipse? Some may have to be dropped? Or, fundamentally changed? If true, it may help committers more to have ability to run tests on multiple platforms, to confirm a fix works everywhere? 

But, I have started to "re-learn" databases. Looks easy to "get data in". Less easy to get the right data in, and getting the right data out. 

Another question will be important soon: 
- - - - - - - - - - - -
What memory settings to use? I saw several "out of memory" errors. 
So "cranked it up" to Xms1G, Xmx1G, but then saw warning from ?jdt tests? "results may be invalid because memory is set too high, should be Xms46, Xmx256 (which, seems it must be out of date? ... or, is it?)

Comment 49 Andrey Loskutov

2014-09-03 03:15:12 EDT

> What memory settings to use? I saw several "out of memory" errors.
> So "cranked it up" to Xms1G, Xmx1G, but then saw warning from ?jdt tests?

Two things here:
* Should performance tests results be "comparable" with the performance users observe in "default" configurations of Eclipse, or not? If they should be comparable, see next point. If not, should we ask projects on the general mailing list regarding they memory expectations *for test* (seems jdt has special preferences here).
* Should we generally increase the "max heap" limit?

Comment 50 David Williams

2014-09-03 15:01:04 EDT

During status meeting, it was stated that the "osgi.tests" are not needed, not really valid, any longer, since we use a different resolver than what it is testing. I opened bug 443233 to have that performance target removed from the test.xml. 

It was also asked if we have a set of "short running" testings, if we could still have the long running tests, even if not done until much later (in theory, perhaps days later). And the answer was "of course", but initially, I'll focus on getting the clean "short running" tests working, end-to-end. 

No one had any immediate feedback about "how much memory" to specify (which wasn't expected ... something to think about) nor not much opinion on "next priority" (if it should be on getting database setup/running, or to make sure "can run on all obvious targets" (such as 4.4, 3.8) ... so, I'll focus on later. We've already seen one case that's known not to be valid on that wide range (the OSGi resolver tests) ... there might be others, once teams have a chance to look/study the results. (I also picked up on an "feeling" ... as much as you can, over the phone ... that no one will look much, until summaries and statistics are available ... I may try to provide "unit test tables" first, so teams can at least see the outright failures easily.)

Comment 51 Markus Keller

2014-09-05 15:00:16 EDT

(In reply to David Williams from comment #48)
> org.eclipse.jdt.text.tests              118 tests > 1 hour several failures.
2 failures in WhitespaceCharacterPainterTest. Looks like we are just too slow and occasionally hit a timeout. Made this more resilient with http://git.eclipse.org/c/jdt/eclipse.jdt.ui.git/commit/?id=3858848db67c1b53f782484d1262a32d457dd762

> org.eclipse.jdt.ui.tests.refactoring    49 tests - 1 failure > 1 hour
Fixed with bug 443410.

Comment 52 David Williams

2014-12-04 11:18:57 EST

The last time I gave an "executive summary" of where this work was, comment 48, I said we were only about 10% done. 

Well, I'm happy to report we are now about 90% done. Done with 'restoring' what was done previously, anyway ... "making improvements" is boundless! 

But, last night's N-build is the first with "finger print graphs" available. Such as, see the starting page at 

http://download.eclipse.org/eclipse/downloads/drops4/N20141203-2000/performance.php

Or, "drill down" into specific areas, such as 

http://download.eclipse.org/eclipse/downloads/drops4/N20141203-2000/performance/org.eclipse.core.php?fp_type=0

(There are no results for some areas, such as jdt.core, or jdt.text, because those are (very) long running tests we run only for I-builds, currently.) 

So of the 10% remaining, that's not really a percentage of time or effort needed, but more of "functionality needed" ... and, you know, the last 10% is the hardest. :) But, we are set up pretty well to claim victory by the end of the year. 

And that's victory at getting the tests running again, on one box, at Eclipse.org. Then we need someone to read through those thousands of numbers to make sense of it all :( ... as well as see what happens when we try it on Windows. 

Just wanted to share the good news.

Comment 53 Alexander Kurtakov

2014-12-04 11:21:53 EST

What do you think about running the tests on CentOS machine Dennis added recently? This would give us some more meaningful numbers in terms of swt tests.

Comment 54 David Williams

2014-12-04 12:21:24 EST

(In reply to Alexander Kurtakov from comment #53)
> What do you think about running the tests on CentOS machine Dennis added
> recently? This would give us some more meaningful numbers in terms of swt
> tests.

I think reasonable to consider -- I hope we can eventually run on dozens of configurations! -- and have opened bug 454159 to track. We'll discuss specific there.

Comment 55 Mike Milinkovich

2014-12-04 12:39:21 EST

(In reply to David Williams from comment #52)
> The last time I gave an "executive summary" of where this work was, comment
> 48, I said we were only about 10% done. 
> 
> Well, I'm happy to report we are now about 90% done. 

David,

Thank you a ton for all of your hard work on this. It is really important, and really appreciated.

Comment 56 Jay Arthanareeswaran

2014-12-11 21:46:13 EST

(In reply to David Williams from comment #52)
> (There are no results for some areas, such as jdt.core, or jdt.text, because
> those are (very) long running tests we run only for I-builds, currently.) 

David, it will be good to see the numbers for jdt.core, esp. for the M build. Would it be possible to enable them for the M builds?

Comment 57 David Williams

2014-12-12 09:32:02 EST

(In reply to Jayaprakash Arthanareeswaran from comment #56)
> (In reply to David Williams from comment #52)
> > (There are no results for some areas, such as jdt.core, or jdt.text, because
> > those are (very) long running tests we run only for I-builds, currently.) 
> 
> David, it will be good to see the numbers for jdt.core, esp. for the M
> build. Would it be possible to enable them for the M builds?

Yes, that is the plan, and in fact do appear to have "ran" against M20141210-0900 but, I think, the "analysis program" is not working as I expected (caching too little or not enough data) so I am working through those issues in bug 455035).

Comment 58 David Williams

2015-06-02 06:47:33 EDT

I am marking this bug fixed, since the "main thing" about it is fixed. 
That is, they do run on Eclipse hardware now. 

Many improvements are still needed, to be truly useful to committers and community, so I will use bug 454921 as the new "umbrella bug" to track on-going performance test work. 

Any unfixed "depends on" bug that was listed in this bug, I blindly copied over to the "depends on" list in bug 454921. 

I marked as "fixed in M7" since no work has been done since then.

akurtakov
avandorp
daniel_megert
david_williams
deepakazad
denis.roy
felix.dorner
jarthana
jfrantzius
john.arthorne
kaloyan
Lars.Vogel
loskutov
manoj.palat
markus.kell.r
mauromol
mike.milinkovich
pascal.rapicault
pawel.pogorzelski1
pwebster
robin
satyam.kandula
sbouchet
shankhba
tjwatson
tomasz.zarna
wayne.beaton