186549 – [results] Improve fingerprint rendering

Bug 186549 - [results] Improve fingerprint rendering

Summary: [results] Improve fingerprint rendering

Status:	RESOLVED FIXED

Alias:	None

Product:	Platform
Classification:	Eclipse Project
Component:	Releng (show other bugs)
Version:	3.3
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	3.5 M4
Assignee:	Frederic Fusier
QA Contact:

URL:
Whiteboard:
Keywords:	performance, test

Depends on:
Blocks:	201920
	Show dependency tree

Reported:	2007-05-11 08:47 EDT by Philipe Mulet
Modified:	2008-11-21 07:49 EST (History)
CC List:	4 users (show)

See Also:

Attachments
org.eclipse.core results html page with new version (70.80 KB, application/octet-stream) 2007-08-29 10:29 EDT, Frederic Fusier	no flags	Details
Proposed patch (19.59 KB, patch) 2007-08-31 13:07 EDT, Frederic Fusier	no flags	Details \| Diff
Early preview of new fingerprints design (66.56 KB, image/jpeg) 2008-10-08 09:25 EDT, Frederic Fusier	no flags	Details
New proposed patch (35.78 KB, patch) 2008-10-15 13:50 EDT, Frederic Fusier	no flags	Details \| Diff
Final proposed patch (101.09 KB, application/octet-stream) 2008-11-08 05:59 EST, Frederic Fusier	no flags	Details
Show Obsolete (3) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Philipe Mulet

2007-05-11 08:47:21 EDT

Something for 3.4 (likely)

As an enhancement, it would be good to render the standard error in fingerprints.
Currently, beyond a threshold, the graph bar is drawn in yellow.
Which means a bad regression may actually be drawn in yellow, rather than in red, which is weakening the perception of the regression.
Same for a big improvement.

I would propose that instead of rendering all in green/yellow/red, we rather simply render the bar in green/red based on current result. Then the last portion of the bar would be coloured in yellow for the area affected by the standard error.

Let's take an example:

Result = +70% (+/- 10%)

If so, render:
[0]--------------------[60]--------[70]--------[80]
<--------60%green----->][10%yellow][^^][10% yellow]

i.e. the bar is mostly green, even though the standard error is quite high, since the gain is big enough.

Result = -10% (+/- 30%)
[-40]------[-10]----------[20]
[30%yellow][^^][30% yellow]

i.e. the bar is entirely yellow, since the error is too high to tell it is wrong for sure.

Comment 1 Philipe Mulet

2007-05-11 08:51:13 EDT

Probably to make visual effect even better, using transparency would be even better. In latter example, drawing some semi-transparent yellow over underlying red bar would be the best... unclear this can be achieved.

Comment 2 Frederic Fusier

2007-08-29 05:29:48 EDT

Transparency seems to be reserved for icon, but I didn't work on it a long time.
I will simply implement the comment 0 requirement as describe for the former example. The latter will be simply the value entirely yellow (10% in the example).

We'll see how to draw fingerprints in an even better way in another bug...

Comment 3 Frederic Fusier

2007-08-29 10:29:53 EDT

Created attachment 77257 [details]
org.eclipse.core results html page with new version

Here's an example for build I20070821-0800 for new fingerprints look with the bug 210469 patch (it also includes fix for this bug)... You can compare it with its current version:
http://download.eclipse.org/eclipse/downloads/drops/I20070821-0800/performance/org.eclipse.core.php?

Full yellow bars are those where abs(value) < error. Otherwise, uncertainty zone is shown in yellow around the value only when the error is over the 3% threshold. It's also not displayed when value is over 100%.

Comment 4 Frederic Fusier

2007-08-31 13:07:36 EDT

Created attachment 77486 [details]
Proposed patch

This patch can generate any of the gif files of previous attached zip file just by changing one option value in the code... Note that it will need to be updated when patch for bug 201469 is released.

Comment 5 Frederic Fusier

2007-08-31 13:08:55 EDT

(In reply to comment #4)
> Created an attachment (id=77486) [details]
> Proposed patch
> 
> This patch can generate any of the gif files of previous attached zip file just
> by changing one option value in the code... Note that it will need to be
> updated when patch for bug 201469 is released.
> 
I was talking about bug 201920 attached zip file as this patch also fixes that bug...

Comment 6 Philipe Mulet

2008-05-23 04:29:11 EDT

Please adjust the target milestone, so it does not point at a closed milestone in the past.

Comment 7 John Arthorne

2008-06-25 17:32:07 EDT

I find the concept of recording standard deviation across different builds is confusing. If a test has a bad regression, and then it is fixed, the test ends up with a very high standard deviation, even if the test is very consistent across multiple runs on the same build. On the other hand measuring standard deviation makes sense for the baselines because it is the same build being tested multiple times, so a high deviation means the test itself is not consistent.

Comment 8 Frederic Fusier

2008-06-26 04:51:21 EDT

(In reply to comment #7)
> I find the concept of recording standard deviation across different builds is
> confusing. If a test has a bad regression, and then it is fixed, the test ends
> up with a very high standard deviation, even if the test is very consistent
> across multiple runs on the same build. On the other hand measuring standard
> deviation makes sense for the baselines because it is the same build being
> tested multiple times, so a high deviation means the test itself is not
> consistent.
> 
The standard deviation shown in status table (and using yellow in fingerprints) is computed on the repeated iterations of the test *during the same run*. It's not the standard deviation computed for all the existing builds.

Usually performance test are written using the following template:

public void test() {

// warm-up
// run the code to test once or several times to warm the JIT on it

// perform several iteration of the test to get the average
for (int i=0; i<MAX; i++) {
    // start the measure
    startMeasuring();

    // run the code once or several times if the time of the test is too short
    // (typically less than 100ms)

    // stop the measure
    stopMeasuring();
}
		
// Commit the measures (i.e. compute the average and the standard deviation
// of the MAX measures and put the numbers in database)
commitMeasurements();

// Verify whether the test fails or not (i.e. had a time regression over 10%)
assertPerformance();
}

So, this standard deviation is meaningful for each test of the current build. It's also computed for baseline, and meaningful as well, but not currently shown in the status table...

The standard deviation you're thinking about is computed but shown only in each test data pages (e.g. for one of JDT/Core Search perf test: http://ganymede-mirror2.eclipse.org/eclipse/downloads/drops/R-3.4-200806172000/performance/eclipseperflnx3/org.eclipse.jdt.core.tests.performance.FullSourceWorkspaceSearchTests.testSearchAllTypeNames()_raw.html)

You can see the standard deviation computed on all builds at the top of the table in 'STD DEV' line (the 'COEF VAR' line below = 'STD DEV' / MEAN).

HTH

Comment 9 John Arthorne

2008-06-26 09:52:08 EDT

Ok, thanks for the info. I didn't realize the standard deviation used to mark the tests yellow was different from the standard deviation in the raw data table.

As these graphs get more complicated, it would help to have a legend at the bottom, or a "How do I read these graphs" hyperlink that would take you to a page describing in more detail how to interpret the graphs. I.e., what yellow means, what grey means, etc.

Comment 10 Frederic Fusier

2008-06-26 10:31:54 EDT

(In reply to comment #9)
> Ok, thanks for the info. I didn't realize the standard deviation used to mark
> the tests yellow was different from the standard deviation in the raw data
> table.
> 
> As these graphs get more complicated, it would help to have a legend at the
> bottom, or a "How do I read these graphs" hyperlink that would take you to a
> page describing in more detail how to interpret the graphs. I.e., what yellow
> means, what grey means, etc.
> 
Please comment bug 202084. I'll use it to improve documentation in generated pages.

Comment 11 Frederic Fusier

2008-10-08 09:25:29 EDT

Created attachment 114544 [details]
Early preview of new fingerprints design

Here's an early preview of what could be the new fingerprints.

The key points of this new design are:
1) Use time instead of percentage for the results (note that percentage would be still displayed...). This will make easier to figure out the importance of a regression or an improvement: we obviously should first look at a 5% regression of a 10s test before looking at a 30% regression of a 100ms test...

2) Show the reference time. In this preview, the 3.4.0 reference bar is in blue above the current build time. The final goal is also to print all previous references since 3.0! This will also help us to have a better relevance for the importance of a regression: typically if a great improvement has been done between 3.3.0 and 3.4.0, then a flat result in 3.5 would be acceptable...

Not yet in this preview:
3) Show the uncertainty zone. Instead of painting the entire bar in yellow when the standard error is over 3%, always paint the uncertainty zone at the end of the bar (as described in comment 0)

4) Display variation regarding previous build(s). This would help us to see how performance evolves for a test during the release cycle. This comparison may be vs. the previous build or/and the previous milestone(s) (previous result(s) would have been smoothed or not). Typically an improvement of 20% may have been done between two milestones and a 10% regression may happen after. Currently, we are not able to detect this regression as the test would still be 10% better than the reference...

Comment 12 Dani Megert

2008-10-09 04:19:56 EDT

Looks nice but could be harder to read for color blind people.

Comment 13 Philipe Mulet

2008-10-09 05:46:42 EDT

Re: comment 11 
>4) Display variation regarding previous build(s
(just a thought)
Would it help for the duration of a release cycle to render bars for each milestones ? Once the release is over, then the intermediate milestones would be discarded ?

Comment 14 Frederic Fusier

2008-10-09 06:37:16 EDT

(In reply to comment #12)
> Looks nice but could be harder to read for color blind people.
> 
This was just a rough draft preview to have an idea of how the fingerprints look like when using times instead of percentages and with several bars for different builds values...
Colors definition and usage will be done after, but I agree that it should be done carefully...

Comment 15 Frederic Fusier

2008-10-09 06:41:56 EDT

(In reply to comment #13)
> Re: comment 11 
> >4) Display variation regarding previous build(s
> (just a thought)
> Would it help for the duration of a release cycle to render bars for each
> milestones ? Once the release is over, then the intermediate milestones would
> be discarded ?
> 
I like this idea, but that would mean a really big height for each fingerprint at the end of the release as we need a minimal height for each bar... Would it be acceptable that a fingerprint took an entire page height (or more)?

Comment 16 Frederic Fusier

2008-10-15 13:50:44 EDT

Created attachment 115170 [details]
New proposed patch

Here a new proposed patch which address points 1) and 3), 2) partially (there's only one basleine) but not 4).

Note that a preview of this new fingerprints should be available on M20081001-0800 performance results soon...

Comment 17 Frederic Fusier

2008-10-15 13:54:14 EDT

(In reply to comment #16)
>
> Note that a preview of this new fingerprints should be available on
> M20081001-0800 performance results soon...
> 
It's now available, thanks Kim :-)

Note that it's only a preview, all feedbacks are welcome to improve this new functionality...

Tips:
1) a text is in italic in the fingerprint means that there's a warning you can read in the tooltip by flying over it
2) fly over bars to see the build result time

Comment 18 Dani Megert

2008-10-27 11:48:45 EDT

I like the new graphs, good work! It's very good that I can choose between different display types and especially, that there's a logarithmic view.

Somme details:
- that a bar can be partially yellow should be part of the legend and explained
- already the global summary should get a legend
- italic not very good visible and confusing because the test name (on the right)
  can also be italic but has a different meaning
- the chosen display type should be preserved (currently I have to set it each
  time I switch a page)
- I get no (or little feedback) when I select another display type

Comment 19 Jerome Lanneluc

2008-10-29 10:52:35 EDT

I also like the graphs. And as Dani said, it would be good if the mode I selected (e.g. time (linear)) was remembered when I go to another component page.

Comment 20 Frederic Fusier

2008-11-08 05:59:25 EST

Created attachment 117390 [details]
Final proposed patch

This patch addresses all points of Dani's feedback in comment 18.

Note that there was nothing I can do for the last one, as this is the browser which should provide feedback while downloading pages (i.e. status bar and egger mouse pointer at least...).

Note that I tested this patch both on Firefox and IE, but only on Windows boxes.
I would appreciate to have feedback if it also works on a Mac or not...

Comment 21 Frederic Fusier

2008-11-08 06:01:18 EST

Results generated using last patch should be available soon at:
http://download.eclipse.org/eclipse/downloads/drops/M20081105-1550/performance/performance.php

Comment 22 Frederic Fusier

2008-11-08 06:07:29 EST

Patch released for 3.5M4 in HEAD stream.