Community
Participate
Working Groups
Diff'ing heap dumps is an extremely valuable ability. However, not many know how to diff two heap dumps when there is no common way to link two different objects together. Here is an algorithm which works fairly well. Please implement it soon. This is a much needed feature. 1. In each heap dump, create a "Merge Shortest Paths to GC Roots" excluding weak/soft references for all objects. 2. For each root in heap dump 1, pair it with the matching root in heap dump 2. 3. For each pair of roots, pair the children using the parent's variable names. 4. For each pair of children, pair the grand-children using the children's variable names. 5. Repeat for each pair in the heap dumps. 6. If an object doesn't have a matching object in the other heap, pair it to nothing and don't process its children. 7. When finished pairing up objects, list all of the roots and add columns which show the difference in memory and object count between the two heaps. For those objects that are paired to nothing, use 0 for the memory usage of nothing. 8. The default should be to descend sort on the memory usage difference. The user can then start expanding nodes into the tree to see where an object in one heap dump is holding onto more objects in another heap dump. This is where the leak is occurring. The user then has to figure out why the leaked objects are being left in their "parent" object... which is beyond the scope of a memory analyzer. In the first phase, get the above working. In the second phase, we have to deal with collections. Collections are difficult because every child variable is the same. When the algorithm hits a collection, it first has to use the type (class name) of the children objects for starters. For some uses of collections, this splits the children into several sets. For most uses of collections, all the children are of the exact same type so this doesn't help. The next thing it should do depends upon the collection. Lists - use the index of the element as a guide. Maps - use the key to match like elements. Sets - no ideas on this one. Good luck. Some objects overload equals() and hashCode(). It would be helpful if the user could supply the jar file where the class is located. MAT could then load the class, create instances of the objects and force the member variables to be the values taken from the heap dump. The equals() and hashCode() operators will hopefully work and the objects can be paired. A good heuristic would be to examine the member variables that are fundamental data types (e.g. char, boolean, byte, short, int, long) and Strings. The objects that are most similar should be paired. In the GUI, allow the user to change the pairings as they see fit.
For the algorithm, I believe these are the roots that should act as starting points as opposed to what is displayed in the GUI. HPROF_GC_ROOT_THREAD_OBJ HPROF_GC_ROOT_JNI_GLOBAL HPROF_GC_ROOT_JNI_LOCAL HPROF_GC_ROOT_JAVA_FRAME HPROF_GC_ROOT_NATIVE_STACK HPROF_GC_ROOT_STICKY_CLASS HPROF_GC_ROOT_THREAD_BLOCK HPROF_GC_ROOT_MONITOR_USED HPROF_GC_ROOT_UNKNOWN
Hi Nathan, thank you very much for your comments! We have a very similar algorithm as a prototype running. The algorithm is comparing the length of the "shortest path to GC roots", the types (classes), the attribute names, the position in the array. To differentiate the GC locals, we use * the thread name * the TID (thread ID) * the class loader name (as extracted for some of the class loaders). The results were sketchy (besides being slow) - about 1/5 of the objects could be found again. Erwin is currently working on some pieces which allow to plug-in different compare functionalities to MAT. This could be one of them. > Some objects overload equals() and hashCode(). It would be helpful if the user > could supply the jar file where the class is located. MAT could then load the > class, create instances of the objects and force the member variables to be the > values taken from the heap dump. The equals() and hashCode() operators will > hopefully work and the objects can be paired. That is a great idea - which could possibly also be used for better #toString representation.
I have created bug 298078: Comparison Features in MAT https://bugs.eclipse.org/bugs/show_bug.cgi?id=298078 It describes what we intend to do in the area of comparison as next steps. The described improvements don't really implement the suggestions you made here. But our current thinking is that they would be a good first step. Once we have the infrastructure briefly described there, it would be possible to write different queries based on comparison, and try out new leak detecting techniques. At this later stage the object-matching proposals made here would be helpful I think. Feel free to comment on the other bug also, this was just the initial proposal.
I've just revisited this bug. The current comparison query generates a table even if the inputs are trees. I have a prototype which generates a comparison tree if one of the inputs is a tree. This wasn't as complicated as I thought - IResultTree and IResultTable are quite similar. Matching of objects is more difficult - the current comparison query didn't work well when there were duplicate keys. My current idea is to keep placeholders for the duplicate keys then later for each table reorder the rows with duplicate keys to try to match to preceding tables. The reordering gets more complicated when there are 3 or more trees or tables to match, and finding the best matching for a key with many duplicates could get expensive. If the trees are from the same snapshot then matching by objectId is easy. Otherwise it got harder, though I had some success by matching by class specific name (via name resolver). I also needed a way of masking object addresses in the key field. A simple way is to have a RegEx pattern to remove from the key. E.g. "\s@ 0x[0-9a-f]+". class java.lang.System @ 0xffe33840 goes to class java.lang.System java.lang.Thread @ 0xe0c2ac98 main goes to java.lang.Thread main Is a simple RegEx to remove sufficient? I don't think having a replacement is important as the matching of the key is then first done by what remains - a fixed replacement wouldn't add anything.
Created attachment 281983 [details] Prototype tree comparison
New Gerrit change created: https://git.eclipse.org/r/158932
Gerrit change https://git.eclipse.org/r/158932 was merged to [master]. Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=5f7e344d50048335334c9b514d218c081e43915d
New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170710
Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170710 was merged to [master]. Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=6ed8a8697ecada630331a7b0b40a0ca2df6184ec
New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170909
Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/170909 was merged to [master]. Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=362772399921f7987d44c52a2c8fa6a80ea229ec
New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/171099
Gerrit change https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/171099 was merged to [master]. Commit: http://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=7fd03e482f81a6e7f4b582905a231e81d6f7ce57
https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/204704/1 https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=24d3c1578214b9f9c872f2fa65d783629d451e78 Snapshot leak suspect by comparison - have link for dominator tree diffs actually go to a dominator tree.
This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/4.