|Re: [mat-dev] Request for comments: MAT next steps|
Hi to all MAT-interested users/developers,
At this year's JavaOne we had the chance to sit together with Andrew and discuss what could be the possible things to do next with the Memory Analyzer. The discussion was continued remotely also with other people from the team. We just put down a list of many things that came up to our minds and wanted to post it (this is what I am doing right now) on the dev mailing list to get some feedback.
Below in the list there are very different things - concrete bugs, ideas for new features, just questions, some nice to have things, some things we need to do. The list is not prioritized, the items are just ordered as they came.
We would like to ask the interested people in the mailing list to have a look at it and give us feedback, for example:
- what are the things in the list you like and you think are important to be implemented
- what are the things you find confusing or unnecessary
- what other ideas you have, but are missing in the list
At the end we should be able to find the items with higher priority and create bugzilla entries for some of the topics, but we thought the dev list is a better forum to start the discussion.
We are looking forward to your valuable feedback!
And here is the list:
- Performance - measure and improve
- double[Integer.MAX_VALUE] - could MAT cope with objects of this size? The problem is that the array to size mapping has an integer as the size (IntIndexCollectorUncompressed). This array would be approx 0x18 + 0x8 * 0x7fffffff = 0x400000010, too big for an int. Expanding the array size array to longs could be overkill. We could do some simple compression - values 0 - 0x7fffffff convert as now, int values 0x80000000 to 0xffffffff convert to (n & 0x7fffffffL)*8 + 0x80000000L.
- How should MAT cope with simple (non-array) objects which vary in size from instance to instance? E.g. if an object is "hashed and moved" there may be an extra slot for the hash code. See http://www.ibm.com/developerworks/ibm/library/i-garbage1/ Ignoring the problem is probably fine for the moment - otherwise we would need a separate index for odd sized objects, or e.g. a byte array indexed by objectID for an object delta size.
- Reduce memory consumption of MAT - e.g. when building indexes pack the idToAddress array in ints, not longs if possible (and expand if not possible). Andrew has some code for this. This index is already compressed when on disk.
- 64-bit JVM with compressed pointers - are the sizes correct in HPROF, DTFJ, PHD? Simple object sizes are calculated/retrieved by the dump index builder. They should be correct for DTFJ. HPROF may be wrong as it calculates instances sizes from the fields, so needs to know about padding and the actual size of object references. DTFJ array sizes are correct in the array index to size index. HPROF sizes may need to allow for compressed pointer object arrays. The ObjectArrayImpl and PrimitiveArrayImpl objects attempt to calculate the shallow heap size of an array based on the pointer size and the array length. This calculation is VM specific, so shouldn't really be done in MAT. As an example you can get the shallow heap size calculated by getUsedHeapSize() being different from the retained heap size for a primitive array or uninitialized object array. Would getSnapshot().getHeapSize(getObjectId()) retrieving the array size from the array ID to size array be fast enough as an alternative?
- Continuous integration - build reports to a mailing list or to Eclipse.org
- Histogram compare heap view - how do we show that one dump declares a class that does not appear in the other dump rather than it appears but with no instances?
- Heap comparison - by dominating parents, children etc? This is a bigger topic which many people already requested. We had already some ideas in the team (Erwin is working in this area), and there was recently a bugzilla with some suggestions: https://bugs.eclipse.org/bugs/show_bug.cgi?id=283778
- The class histogram comparison compares classes by name. If there are duplicate names perhaps should it attempt to match by ID, assuming the two dumps are from the same VM at different times?
- The permanent hashcode is sometimes available from DTFJ. This is the identityHashcode for an object which is guaranteed not to change for the life of the object. It might not exist for an object in a dump if the format did not support it or if the VM had not been asked to calculate the hashcode. Would knowing it aid dump comparison? Would we display it as as hidden field or attribute?
- WeakHashMap - find path between value and key. This is a common reason for leaks which can be easily identified using a heap dump. Perhaps a more general way to search paths between arbitrary objects would be useful.
- Collection queries - Show not only size and percentage used , but also what is the " wasted " space in bytes.
- "What would happen if I removed these objects" query. Would we be able to reorder the heap to see how heap looks if a user selected set of objects were dropped. Would we need to rebuild indexes , recompute the dominator tree? This is rather a nice to have.
- Query plug-in - need a guide/example of writing one. The MAT can be very easily extended with additional queries, but there is no good doc/ description how this should be done
- OutboundRefs - the first ref always seems to be the type - is this checked, enforced, documented or required - could we use the object2class type array or is the outbound refs the only way of doing it? This does seem to be required, but the object2class index might be more efficient.
- Add a method ArrayInt.copyTo(int dest, int off...) . The only way to get the content right now is .toArray() which uses internally System.arraycopy(). There were places in the code where System.arraycopy is called also afterwards. We need to find such places and avoid this double effort using the new method.
- JVM info e.g. parameters such as -Xmx. How do we display these? Perhaps dump specific queries such as DTFJ @argument ImageProcess
- Go through / cleanup compiler warnings
- Documentation: MAT - create problem / scenario oriented help pages, e.g. "how can I reduce footprint".
- Documentation: DTFJ - are there any changes which should be done there after the publishing the adapter at Eclipse? Do you see any need for improvements to the DTFJ docu?
- Cleanup Bugzilla messages - close some of the fixed, prioritize the rest and reply to NEW messages.
- Kick out „classic installer" from RCP and use p2 instead
- Change DTFJ extension mechanism so that (a) the bundle starts correctly (no flags on command line) and (b) the deprecated eclipse coding is not used anymore
- Use com.ibm.icu throughout - not java.text ?
mat-dev mailing list