283778 – Diff Heap Dumps

Bug 283778 - Diff Heap Dumps

Summary: Diff Heap Dumps

Status:	CLOSED MOVED

Alias:	None

Product:	MAT
Classification:	Tools
Component:	Core (show other bugs)
Version:	0.8
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	Krum Tsvetkov
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:	298078
Blocks:
	Show dependency tree

Reported:	2009-07-16 20:25 EDT by Nathan Reynolds
Modified:	2024-05-08 11:53 EDT (History)
CC List:	5 users (show)

See Also:	Gerrit Change Git Commit Gerrit Change Git Commit Gerrit Change Git Commit Gerrit Change Git Commit

Attachments
Prototype tree comparison (43.71 KB, image/png) 2020-03-02 16:17 EST, Andrew Johnson	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Nathan Reynolds

2009-07-16 20:25:33 EDT

Diff'ing heap dumps is an extremely valuable ability. However, not many know how to diff two heap dumps when there is no common way to link two different objects together. Here is an algorithm which works fairly well. Please implement it soon. This is a much needed feature.

1. In each heap dump, create a "Merge Shortest Paths to GC Roots" excluding weak/soft references for all objects.
2. For each root in heap dump 1, pair it with the matching root in heap dump 2.
3. For each pair of roots, pair the children using the parent's variable names.
4. For each pair of children, pair the grand-children using the children's variable names.
5. Repeat for each pair in the heap dumps.
6. If an object doesn't have a matching object in the other heap, pair it to nothing and don't process its children.
7. When finished pairing up objects, list all of the roots and add columns which show the difference in memory and object count between the two heaps. For those objects that are paired to nothing, use 0 for the memory usage of nothing.
8. The default should be to descend sort on the memory usage difference.

The user can then start expanding nodes into the tree to see where an object in one heap dump is holding onto more objects in another heap dump. This is where the leak is occurring. The user then has to figure out why the leaked objects are being left in their "parent" object... which is beyond the scope of a memory analyzer.

In the first phase, get the above working. In the second phase, we have to deal with collections. Collections are difficult because every child variable is the same.

When the algorithm hits a collection, it first has to use the type (class name) of the children objects for starters. For some uses of collections, this splits the children into several sets. For most uses of collections, all the children are of the exact same type so this doesn't help.

The next thing it should do depends upon the collection.

Lists - use the index of the element as a guide.
Maps - use the key to match like elements.
Sets - no ideas on this one. Good luck.

Some objects overload equals() and hashCode(). It would be helpful if the user could supply the jar file where the class is located. MAT could then load the class, create instances of the objects and force the member variables to be the values taken from the heap dump. The equals() and hashCode() operators will hopefully work and the objects can be paired.

A good heuristic would be to examine the member variables that are fundamental data types (e.g. char, boolean, byte, short, int, long) and Strings. The objects that are most similar should be paired.

In the GUI, allow the user to change the pairings as they see fit.

Comment 1 Nathan Reynolds

2009-07-17 14:05:05 EDT

For the algorithm, I believe these are the roots that should act as starting points as opposed to what is displayed in the GUI.

HPROF_GC_ROOT_THREAD_OBJ
HPROF_GC_ROOT_JNI_GLOBAL
HPROF_GC_ROOT_JNI_LOCAL
HPROF_GC_ROOT_JAVA_FRAME
HPROF_GC_ROOT_NATIVE_STACK
HPROF_GC_ROOT_STICKY_CLASS
HPROF_GC_ROOT_THREAD_BLOCK
HPROF_GC_ROOT_MONITOR_USED
HPROF_GC_ROOT_UNKNOWN

Comment 2 Andreas Buchen

2009-07-20 14:45:20 EDT

Hi Nathan,

thank you very much for your comments!


We have a very similar algorithm as a prototype running. The algorithm is comparing the length of the "shortest path to GC roots", the types (classes), the attribute names, the position in the array.

To differentiate the GC locals, we use
* the thread name
* the TID (thread ID)
* the class loader name (as extracted for some of the class loaders).

The results were sketchy (besides being slow) - about 1/5 of the objects could be found again.


Erwin is currently working on some pieces which allow to plug-in different compare functionalities to MAT. This could be one of them.


> Some objects overload equals() and hashCode().  It would be helpful if the user
> could supply the jar file where the class is located.  MAT could then load the
> class, create instances of the objects and force the member variables to be the
> values taken from the heap dump.  The equals() and hashCode() operators will
> hopefully work and the objects can be paired.

That is a great idea - which could possibly also be used for better #toString representation.

Comment 3 Krum Tsvetkov

2009-12-18 03:20:20 EST

I have created bug 298078: Comparison Features in MAT
https://bugs.eclipse.org/bugs/show_bug.cgi?id=298078
It describes what we intend to do in the area of comparison as next steps. The described improvements don't really implement the suggestions you made here.
But our current thinking is that they would be a good first step.
Once we have the infrastructure briefly described there, it would be possible to write different queries based on comparison, and try out new leak detecting techniques. At this later stage the object-matching proposals made here would be helpful I think.
Feel free to comment on the other bug also, this was just the initial proposal.

Comment 4 Andrew Johnson

2020-03-02 16:14:52 EST

I've just revisited this bug.
The current comparison query generates a table even if the inputs are trees.

I have a prototype which generates a comparison tree if one of the inputs is a tree. This wasn't as complicated as I thought - IResultTree and IResultTable are quite similar.

Matching of objects is more difficult - the current comparison query didn't work well when there were duplicate keys.
My current idea is to keep placeholders for the duplicate keys then later for each table reorder the rows with duplicate keys to try to match to preceding tables. The reordering gets more complicated when there are 3 or more trees or tables to match, and finding the best matching for a key with many duplicates could get expensive.

If the trees are from the same snapshot then matching by objectId is easy.
Otherwise it got harder, though I had some success by matching by class specific name (via name resolver).
I also needed a way of masking object addresses in the key field. A simple way is to have a RegEx pattern to remove 
from the key.
E.g. "\s@ 0x[0-9a-f]+".
class java.lang.System @ 0xffe33840
goes to
class java.lang.System

java.lang.Thread @ 0xe0c2ac98  main
goes to
java.lang.Thread  main

Is a simple RegEx to remove sufficient? I don't think having a replacement is important as the matching of the key
is then first done by what remains - a fixed replacement wouldn't add anything.

Comment 5 Andrew Johnson

2020-03-02 16:17:34 EST

Created attachment 281983 [details]
Prototype tree comparison

Comment 6 Eclipse Genie

2020-03-06 12:56:25 EST

New Gerrit change created: https://git.eclipse.org/r/158932

Comment 7 Eclipse Genie

2020-03-25 13:40:15 EDT

Gerrit change https://git.eclipse.org/r/158932 was merged to [master].
Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=5f7e344d50048335334c9b514d218c081e43915d

Comment 8 Eclipse Genie

2020-10-13 10:07:41 EDT

New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170710

Comment 9 Eclipse Genie

2020-10-13 10:08:53 EDT

Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170710 was merged to [master].
Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=6ed8a8697ecada630331a7b0b40a0ca2df6184ec

Comment 10 Eclipse Genie

2020-10-18 09:44:46 EDT

New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170909

Comment 11 Eclipse Genie

2020-10-18 09:45:58 EDT

Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170909 was merged to [master].
Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=362772399921f7987d44c52a2c8fa6a80ea229ec

Comment 12 Eclipse Genie

2020-10-22 02:32:01 EDT

New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/171099

Comment 13 Eclipse Genie

2020-10-22 02:34:12 EDT

Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/171099 was merged to [master].
Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=7fd03e482f81a6e7f4b582905a231e81d6f7ce57

Comment 14 Andrew Johnson

2023-10-01 13:02:22 EDT

https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/204704/1
https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=24d3c1578214b9f9c872f2fa65d783629d451e78

Snapshot leak suspect by comparison - have link for dominator tree diffs actually go to a dominator tree.

Comment 15 Eclipse Webmaster

2024-05-08 11:53:43 EDT

This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/4.