Bug 298078 - Comparison Features in MAT
Summary: Comparison Features in MAT
Status: CLOSED MOVED
Alias: None
Product: MAT
Classification: Tools
Component: Core (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Krum Tsvetkov CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on: 305150 305152 305154 347648 394222 541539 561460
Blocks: 271908 283778
  Show dependency tree
 
Reported: 2009-12-17 09:30 EST by Krum Tsvetkov CLA
Modified: 2024-05-08 12:50 EDT (History)
4 users (show)

See Also:


Attachments
Comparison features proposal as PDF (19.22 KB, application/octet-stream)
2009-12-17 09:35 EST, Krum Tsvetkov CLA
no flags Details
Examples how to use the Compare Basket functionality (55.70 KB, application/octet-stream)
2010-02-25 10:59 EST, Krum Tsvetkov CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Krum Tsvetkov CLA 2009-12-17 09:30:39 EST
The existing comparison possibilities inside MAT are very limited. We would like to provide much more flexibility (and hopefully value) to the users when it comes to comparing different tables / results inside MAT. After some discussions we have some ideas what to provide. These ideas I have tried to summarize below.

These give just a high level overview of what we think would be useful. More detailed technical discussion how to implement them should follow. I will also attach a pdf version of the text below for easier reading.

-------------------------------------------------------------------------------------------------------------------
Comparison Features in MAT

1. Comparison of two or more arbitrary tables

1.1 Requirement

It should be possible to compare in MAT two and more arbitrary result tables which have similar columns, for example two histograms, two retained sets, two results from the ?groub by value?, etc? These result tables can be from one and the same heap dump, or from two or more different heap dumps.
Having this functionality one can do things like:
* Compare the retained heap of the session for userA and userB (both within the same heap dump) and see why one is bigger than the other
* Analyze how the retained set of userA?s session is changing over time (comparing heap dumps from different tests)  
* Compare several group_by_value results and find which Strings appear in one set and are missing in another one, how the number of occurrence changes, etc? 
It should be possible to (re)order the results in an arbitrary way.

1.2 Comparing

1.2.1 Selecting the results to be compared

The first important question that has to be solved is how to select the tables which should be compared. 
Our current proposal is to use the ?Navigation History? view which contains all currently open results + all results which could be recreated.

The selected results should go into something like a ?compare container? where one can reorder them if needed.
We additionally thought about drag & drop into such a container, but as a beginning the navigation history seemed to be the better approach.

1.2.2 Finding the common ?key?

The next thing to decide is which column should be used as a key for the comparison. We didn?t come up with a good use case where the first column is NOT proper. Therefore for simplicity we would assume that always the first column is the key ? usually a class name, the string from ?group_by_value?, etc?
The first column of the result should be the union of all keys. It should be possible to distinguish (based on the other columns) if some keys were not present in some of the compared tables.

1.2.3 Comparing the columns

Columns with the same name should be compared. If one of the compared tables has a column which is not present in the rest of the tables, then it could be either ignored, or we could make it visible that such a column didn?t exist.
For each common column there should be N columns in the result, presenting the absolute values (or deltas) from each of the N compared tables, e.g.:

Class Name  | Shallow #1 | Shallow #2 | Shallow #3 | Retained #1 | Retained #2 | Retained #3
ClassA         |                  |                   |                  |                    |                   |                  |
ClassB         |                  |                   |                  |                    |                   |                  |
...

1.3 Displaying the result

1.3.1 What to display - delta or absolute numbers

The first displayed value for every compared column should be the absolute value of the first compared table, e.g. the absolute ?shallow size #1?. The user should be able to select if he wants to see for the rest (?shallow size #2? and ?shallow size #3?) the delta to the previous, the delta to the first, or the absolute value. A button in the toolbar should be fine for this.

1.3.2 Selecting columns to be compared / displayed (e.g. Shallow Heap, Retained Heap, etc?) 

Another important question to decide is which columns (besides the key) should be compared. One option is to open a wizard / dialog before the comparison is executed. This however adds always one step, even if all columns are desired. 
Therefore our current suggestion is to compare and display initially all columns, and make it very easy to for the user to show / hide them. Having one button per column in the toolbar could be one solution. The user will just have to Press/ Release the corresponding button if the information in the compare result is too much.

1.4 Continue from the result (context menus)

Once the comparison result is displayed, the user should be able to interact with it, i.e. execute further queries on certain rows / cells of the result. Therefore the corresponding context menus have to be provided.

1.4.1 Execute queries with the objects behind a concrete table cell

It should be possible to open a context menu and say ?show the retained set of the objects of classA, from the first table that was compared?

1.4.2 Execute queries on a row ? next compare step

It should be possible to open a context menu on a certain row R of the comparison result and say ?execute query X on row R of each of the underlying tables, compare the results, and show me a table with the compared results?. For example if we have compared 3 histograms, each of them containing a row with String, I should be able to right click on String in the comparison result and execute ?group by value? on all of them. Then the tool should run behind the scenes group by value on each of the 3 rows and show me at the end a comparison for the 3 group by value results.

2. Reports / queries over several heap dumps

Once the functionality to compare arbitrary results is present, we should: 
* Provide the possibility to execute an existing query on more than one heap dumps and compare the results, e.g. replace open the histogram in three dumps, select the results for compare and compare them, with something like ?run histogram on ?? and select the dumps.
* Provide the mechanism to have queries which take several snapshots as a parameter, e.g. @Argument ISnapshot[]. The UIs for selecting and ordering the heap dumps should be outside the queries.
* Provide the possibility to execute a report over several heap dumps via the scripts, without UI.

Having these we should be able to try out and implement some leak searching algorithms based on comparison of multiple heap dumps.
-------------------------------- end of proposal -----------------------------------------------------------------

Feedback is appreciated!
Comment 1 Krum Tsvetkov CLA 2009-12-17 09:35:32 EST
Created attachment 154661 [details]
Comparison features proposal as PDF
Comment 2 Andrew Johnson CLA 2009-12-23 10:15:05 EST
I would like to see some differencing or intersection capabilities.

Given table A and table B generate table C where C contains all members of A which are matched in B.

Given table A and table B generate table C where C contains all members of A which are not matched in B.

The matching could be on object id or value, but I'm not sure how this works in the general case of any tables.

This definition can then work between different snapshots.

The union of two tables could be useful but this only works when they are from the same snapshot as MAT is not designed to have objects from different snapshots in the same table.
Comment 3 Andrew Johnson CLA 2010-01-14 03:40:21 EST
Perhaps the object view should have the value of the object as a separate column from the class name and address. This might be better as a comparison item between two different dumps.
Comment 4 Krum Tsvetkov CLA 2010-02-25 10:42:48 EST
Today I committed some changes which:
- introduce a "Compare Basket View" - one can add there table results, (re)order them, and say "Compare" to trigger the comparison of the added tables
- extend the "Navigation History View" so that one can select certain results from it and add them to the "Compare Basket"
- added a (probably temporary) query which does the comparison of the added tables and displays the results

I will attach a short document with a few screenshots showing how to use these and to compare several tables.

These changes enable us to compare not only two full histograms, but an arbitrary number of arbitrary tables. The comparison is done as described in point 1.2 in the PDF attached some time ago. 

The feature is far away from being ready, but I hoped to get some early feedback.

Some words about the displayed results:
Currently all columns are displayed (one column per compared property for each table). This information is often too much, and as described in the document we will offer some possibilities to hide/show certain columns.

Once the user hits compare, the query will open more than one tab:
- one with the absolute values displayed
- one with the absolute values of the first table, and the differences against the first table
- one with the absolute values of the first table and the differences against the previous table. This tab appears only if one compares more than two tables.
This behavior also should change. The user should be given the option to select which of the alternatives should be displayed. For the moment I added all, so that we can experiment a little bit and see which of them is good for which purposes.

If you are interested to try this out either sync the sources, or use our nightly build (not there yet, but with the next nightly build it should be there). The link to our hudson is:
https://build.eclipse.org/hudson/job/cbi-mat-nightly/lastSuccessfulBuild/artifact/
Once there, take the zipped update site under /build/N2010xxxxxx/MAT-Update-N2010xxxxxx.zip
Comment 5 Krum Tsvetkov CLA 2010-02-25 10:59:36 EST
Created attachment 160198 [details]
Examples how to use the Compare Basket functionality
Comment 6 Andrew Johnson CLA 2010-02-28 15:45:59 EST
The Compare basket works quite well and is similar to, but more flexible than previous histogram comparison.

- it would be nice to delete a selection from the compare basket
- should 'compare' only compare the selected items (if more than one selected)?
- should the menu items be left justified?
- the results items should also indicate the dump they are from
- there isn't yet a context menu

Does it make sense to add a histogram to the basket multiple times? Perhaps it does and is more consistent.

How do we make this programmable for use in complex queries or batch mode?
The CompareTablesQuery might be able to do this, though then it should be in the mat.api bundle. The ArgumentsWizard doesn't yet handle 

IResultTable[] or IResultTable.


How do we get a query that retains the actual objects?

E.g. with set A and set B of objects in the same dump, we may want:
1. A-B difference - like A&~B  
2. B-A difference - like B&~A
3. symmetric difference of A and B - like A^B
4. union of A and B - like A|B
5. intersection of A and B - like A&B

It is hard to have two independent selections of objects, though the panes do retain selections.
The query should be usable from other queries (fully programmable), from the tool bar, or from the context menu.

Perhaps it would be as useful to have one selection of objects via the context menu, and select the other set of objects from another 

query result pane too. This needs to be done for a standard query - should it be via IHeapObjectArgument parameters, or a separate 

Histogram one? If it was via IHeapObjectArgument then there would need to be an extra advanced option with a pull-downs choice box for the histogram or pane. The other idea would be to use a histogram or IResultSet for the argument type. Using only the selected items might be confusing as the pane will be hidden. It may be safer if the user generates a new histogram from a selection.

@Argument
IHeapObectArgument setA;
@Argument
IHeapObectArgument setB;

or
@Argument
sIHeapObectArgument setA;
@Argument
Histogram histB;

or
@Argument
IHeapObectArgument setA;
@Argument
IResultTable resB;
Comment 7 Krum Tsvetkov CLA 2010-03-09 10:36:31 EST
Thanks for the comments! Now I have created three other sub-tasks for improving the different areas involved in comparison:

- Bug 305150 for improvements to the Compare Basket View
- Bug 305152 for improvements to the way results are displayed
- Bug 305154 for providing the programming model for queries doing comparison
Comment 8 Andrew Johnson CLA 2011-01-16 14:32:35 EST
I have added context menus for the comparison tables as in section 1.4.1 of the description.
There is now a ContextProvider from getResultMetaData for each table.
The tables are named "Table 1", "Table 2" etc.

I don't know how to get a better description.

The results in the comparison basket could do with the snapshot name being given too, then perhaps we could use this.
Comment 9 Randall Theobald CLA 2011-02-25 16:26:13 EST
Like I mentioned in bug 271908, one of the most useful features would be to not only enable raw numbers and deltas, but also ratios. For example, let's say I have dump A and dump B. Dump B shows big MB delta in char[]'s as compared to dump A, but finding where those extra char[]'s have been inserted can be difficult to find since the raw numbers or deltas of the objects responsible perhaps only increased from 3 to 15 or something (whereas char[]'s increased by thousands). However, if you can enable ratios and then sort on the ratio column, the responsible objects tend to go straight to the top. Anyway, bad example, but I hope to see this in MAT someday.
Comment 10 Andrew Johnson CLA 2011-03-01 16:00:39 EST
As requested, I've added a percentage comparison too.
Comment 11 Krum Tsvetkov CLA 2011-03-07 09:55:34 EST
I played a bit with this. It is definitely useful to have the difference in %. However, what I was wondering after some time using it was the following:
Right now we offer the user to choose between 1) absolute values, 2) difference (absolute), and 3) difference (in %).
I got the feeling it would be nice to have the possibility to sort/see the difference in % AND at the same time see the absolute numbers, i.e. see at the same time 1) and 3)
This could have the somewhat negative effect of having too-many columns, especially if one compares more than 2 tables and they have more than 2 columns. But may be we could show by default the absolute values, and have the possibility to show/hide the different forms of comparison.
Does this sound reasonable also to the rest, or is it just me thinking this way. I'll be happy to get some comments.
Comment 12 Randall Theobald CLA 2011-03-07 10:20:34 EST
Yes. I hadn't gotten the chance to play with this yet, but the % is not nearly as useful without the raw numbers as well. I guess I didn't state that explicitly, but I didn't mean to only show the %. I think the different columns should ALL be selectable if the user desires. I don't think we should force the user to only see one at a time.
Comment 13 Andrew Johnson CLA 2011-04-06 11:43:38 EDT
The current code displays the percentages as shown by this table:
        Table 2       
        (blank)      0       1       5      10
Table 1
(blank) (blank) (blank) (blank) (blank) (blank)
     0  (blank)    NaN    +Inf    +Inf    +Inf
     1  (blank)  -100%     +0%   +400%   +900%
     5  (blank)  -100%    -80%     +0%   +100%
    10  (blank)  -100%    -90%    -50%     +0%

This can cause some problems as NaN sorts bigger than +Infinity and blank sorts less than anything, so clicking on a percent header can sometime just show NaN or blank in the top-most rows.

We can filter out the blanks and NaN by typing '>=-100%' into the filter.

Should we convert the NaN to 0%? I.e. if there are 0 instances of a class in both tables is showing the change as +0% sensible, if not mathematically correct?
Comment 14 Randall Theobald CLA 2011-04-06 11:51:09 EDT
I would argue to just divide by 1 if the denominator is ever 0 (even if the numerator is not 0). Definitely not mathematically correct, but so much more useful. This is what I ended up doing in a custom tool I used for the same thing years back because it was so much more useful for quick glances. Show the raw values too, though.
Comment 15 Andrew Johnson CLA 2011-04-07 12:17:22 EDT
I don't want to convert 0->5 to 400%. That's too strange for me. If you are interested in items going up from 0 then filter the base using '<= 0' then
sort by number in the new table.

I can add an extra column for percentage, and it is easiest to keep the difference column too, rather than make the difference column optional when choosing the percentage.

I can also change the label on the columns to show whether it is 'Difference from base table' or 'Difference from preceding table'.

Class Name                        | Objects #0 | Objects #1-#0 | Objects #1-#0 % | Shallow Heap #0 | Shallow Heap #1-#0 | Shallow Heap #1-#0 %
-----------------------------------------------------------------------------------------------------------------------------------------------
java.lang.ref.Reference[]         |          0 |            +4 |             +?% |               0 |             +2,112 |                  +?%
long[][]                          |          0 |            +3 |             +?% |               0 |                +72 |                  +?%
java.util.Collection[]            |          0 |            +1 |             +?% |               0 |                +64 |                  +?%
java.lang.ApplicationShutdownHooks|          0 |            +1 |             +?% |               0 |                +16 |                  +?%
java.io.ExpiringCache$Entry       |          6 |           +45 |           +750% |             144 |             +1,080 |                +750%


Class Name                | Objects #0 | Objects #1-#0 | Objects #1-#0 % | Shallow Heap #0 | Shallow Heap #1-#0 | Shallow Heap #1-#0 %
---------------------------------------------------------------------------------------------------------------------------------------
java.lang.reflect.Modifier|          0 |            +0 |             +0% |               0 |                 +0 |                  +0%
---------------------------------------------------------------------------------------------------------------------------------------
Comment 16 Andrew Johnson CLA 2020-10-13 13:01:04 EDT
I think we have done everything except 1.4.2

1.1 everything except perhaps reordering the results in an arbitrary way
1.2.1 selecting done via compare basket
1.2.2 common key now found, there is an option to change this on the compare tables query
1.2.3 columns compared
1.3.1 delta or absolute available
1.3.2 choice of columns available
1.4.1 context menu available - but only from tables from the current snapshot
1.4.2 not done - no way to run a query on a selection on each of the underlying tables / snapshots
2. "execute an existing query on more than one heap dump" - done via the Java Basics > Simple Comparison query
argument mechanism done
report over several heapdumps done - see org.eclipse.mat.api:suspect2

It's hard to see how to run queries on other snapshots from the
context menu. IContextObject and IContextObjectSet return object IDs which
are implicitly connected to the current snapshot.

One way would be to have the an extension / new interface for the returned IContextObject
which provides the snapshot / snapshot context.

Perhaps it would better if that information were attached to the ContextProvider so the ContextProvider could
also give an IQueryContext - or was also an IQueryContext. We could then have to check all uses of a ContextProvider
to ensure that the right IQueryContext was used.
Comment 17 Eclipse Webmaster CLA 2024-05-08 12:50:59 EDT
This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/7.