Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[mat-dev] Export HPROF and java.lang.Class

I have merged some changes adding an 'export HPROF' query.

519274: Redacted Binary or PHD dump so as to protect privacy data
https://bugs.eclipse.org/bugs/show_bug.cgi?id=519274

This takes the current snapshot and exports it as a new HPROF file, whatever the original input format.

It allows some/all primitive fields and arrays to be redacted to help hide sensitive data.
Class and field named can also be changed to hide the nature of the application.

Please try it and add comments to the bug, or here if more general.


Quite a few extra things came up while coding this, so if you have some time please read as there are some design questions.

Instance fields of java.lang.Class

In Java, 'java.lang.Class' objects are instance objects and could have fields. For example from Java 8, using javap on java.lang.Class:

  private volatile transient java.lang.reflect.Constructor<T> cachedConstructor;
  private volatile transient java.lang.Class<?> newInstanceCallerCache;
  private transient java.lang.String name;
  private final java.lang.ClassLoader classLoader;
  private volatile transient java.lang.ref.SoftReference<java.lang.Class$ReflectionData<T>> reflectionData;
  private volatile transient int classRedefinedCount;
  private volatile transient sun.reflect.generics.repository.ClassRepository genericInfo;
  private volatile transient T[] enumConstants;
  private volatile transient java.util.Map<java.lang.String, T> enumConstantDirectory;
  private volatile transient java.lang.Class$AnnotationData annotationData;
  private volatile transient sun.reflect.annotation.AnnotationType annotationType;
  transient java.lang.ClassValue$ClassValueMap classValueMap;

These are not handled well by HPROF or Memory Analyzer.
In HPROF, the class dump record has constant pool and static field references, and field definitions for use with instance dump records.
The HPROF class dump for java.lang.Class has field definitions for those fields shown above by javap.

Memory Analyzer HPROF parser sees those field definitions, and will use them if an instance_dump of java.lang.Class is found. These are the fields in a HPROF class_dump for java.lang.Class:

ref classValueMap
ref annotationType
ref annotationData
ref enumConstantDirectory
ref enumConstants
ref genericInfo
int classRedefinedCount
ref reflectionData
ref name
ref newInstanceCallerCache
ref cachedConstructor

Perhaps the classLoader field is omitted from the HPROF dump as that available in each class dump record.

A few java.lang.Class objects have no instances (Integer.TYPE = 'int') etc. and appear in the HPROF file as instance dump record. Those objects are IInstance not IClass and then have the above fields visible in the object inspector attributes tab, and the java.lang.Class statics in the statics tab.

For test dump oracle_jdk9_01_x64.hprof the fields are 0 / null.
For test dump oracle_jdk9_01_x64.hprof some fields are non-null.

The problem is that for regular classes the values of those fields is not present in the HPROF dump. This can hide some JVM memory leaks: https://www.eclipse.org/lists/mat-dev/msg00514.html

If they were present, then how would they appear in a HPROF dump, and how would Memory Analyzer handle them?

Unlike what I said in https://www.eclipse.org/lists/mat-dev/msg00515.html, ClassImpl does not have getFields(), and so isn't currently an IInstance, although it does extend AbstractObjectImpl.

The current DTFJ parser inserts java.lang.Class instance fields as '<name>' static fields in the ClassImpl.

Options:
  1. Support fields on java.lang.Class: e.g. if we changed ClassImpl to also implement IInstance:
    1. Check uses of IClass versus IInstance - i.e. does existing code get confused by IObject which is a class and also an instance
    2. Object inspector
      1. current classes display their static fields in the statics tab, and the attributes tab shows the pseudo-statics fields (marked with '<' )
      2. current 'int', 'long' etc. are actually IInstances, not IClass, as they are not types of other objects. They have statics as the statics of java.lang.Class and the attributes as their per instance fields as an instance of java.lang.Class.
      3. current objects (IInstance) display their fields in the attributes tab and the statics of the class in the statics tab (excluding the pseudo-statics marked with '<' ).
      4. so - for a ClassImpl/IInstance - the attribute tab should show the java.lang.Class per instance fields (with or without '<' ?)
    3. Serialization or any modified ClassImpl - extra fields for old code, new code deserializing old objects
  2. Just add Class fields as static fields to the ClassImpl as for DTFJ.
    1. This should work, although we lose the symmetry between the dummy classes and real ones.
      IInstance intclass. intclass.getField("jlc_field_name")
      ClassImpl regularclass; regularclass.getStaticField("<jlc_field_name>")
    2. Perhaps resolveValue should work on the base name. ClassImpl internalGetField could add <> so then the following works
      ClassImpl regularclass; regularclass.resolveValue("name")

Normally the type of every object is an instance of java.lang.Class. The HPROF format assumes this. With export HPROF and renaming of all classes, including java.lang.Class we don't know the type of the type of each object. java.lang.Class might not even be in the dump.
  1. Support a instance_dump record for class_dump record
    1. Same address ID for both - [could break compatibility with JHat if we generated dumps this way]
    2. How to read fields - on parsing, or on us?
    3. Change type of class_dump (ClassImpl) to match Class ID field of instance dump. In the HPROF parser, if we have already counted the class_dump object as of type java.lang.Class we need to decrement the count and sizes of instances of java.lang.Class and increment them of the actual type.
  2. Ignore - and let HPROF parse insert references to dummy java.lang.Class type
  3. Try to identify renamed 'java.lang.Class' type.
  4. Use a reserved field in class dump to hold the type of the class. This might not be compatible with JHat - or JHat might just ignore it.

Extra references

Currently for HPROF dumps:

protection domain and signers are not added as references
add them as dummy statics? or dummy fields on java.lang.Class
+ 2 reserved ID fields
constant pool entries are not added as references or fields
size of java.lang.Class does not include java.lang.Class per instance fields - these could add an extra 80 bytes or so to class sizes

These changes would change the reported sizes for dumps, so we may need updates to test cases.

HPROF parser GC root names

The parser incorrectly read roots of type 'ROOT JNI GLOBAL' and gave them type GCRootInfo.Type.NATIVE_STACK (used for ''ROOT NATIVE STACK') instead of GCRootInfo.Type.NATIVE_STATIC. They will now appear in the GC Roots query as 'JNI Global'.

Batch mode export

380600: Resave Heap Dump without Unreachable Objects
https://bugs.eclipse.org/bugs/show_bug.cgi?id=380600
has a request: 'As a side feature, it would be nice if this could be run from the command line. This would help with automated heap dump collection and processing.'

This also had a request for improved command line
https://www.eclipse.org/lists/mat-dev/msg00449.html
483418: New Feature: Compare two dumps from command line
https://bugs.eclipse.org/bugs/show_bug.cgi?id=483418
had a possible way of doing it.

It would be nice if we had a way of batch processing two dumps.
MAT does have the concept of Argument.Advice.SECONDARY_SNAPSHOT
and compare tables queries with multiple snapshots.

Secondary snapshot can be done in a query as follows:
    @Argument(advice = Advice.SECONDARY_SNAPSHOT)
    public ISnapshot snapshot2;

From the GUI it brings up a file dialog or a selection of previous snapshots.

Batch mode is normally driven by reports which can then run queries, rather than directly calling queries. It's hard to vary how reports are generated without rewriting the definition, although there are <param> elements in report definitions.
One idea is to modify ParseSnapshotApp.java to inject all (or some?) of the command line options into the Spec before the report is run. These would then appear as <param.> key/values.
So for export_hprof.xml
<query name="exporthprof">
        <param key="output" value="" ></param>
        <command>export_hprof -output ${output}</command>
</query>

ParseHeapDump -output=new.hprof original.hprof export_hprof.xml

the -output=new.hprof would be accepted as an option, original.hprof dump would be parsed, then output=new.hprof added to the spec report overwriting any existing values, then the command query:
export_hprof -output new.hprof
would be run. If the report definition were added to the HPROF plugin then it would appear as a report in the 'Run Expert System Test' menu. If that was annoying then we would need other ways of categorising / hiding report, as for queries.

If a new command (for example 'comparehistogram') had an option named for example snapshot2 then this could be used to run a report using a secondary snapshot.

<query name="compare">
        <param key="snapshot" value="default.hprof" ></param>
        <command>comparehistogram -snapshot2 ${snapshot}</command>
</query>

ParseHeapDump -snashot=second.hprof original.hprof compare.xml

There's not then a way of varying the options provided for parsing the secondary snapshot. Is that really a problem?

Existing comparison queries operate on multiple tables and snapshots supplied via the compare basket. The compare basket is a UI feature. I don't see a straightforward way to express queries in a report returning tables which are then supplied to another query.

HPROF parser messages

Slightly improved error messages for corrupt dumps

HPROF parser missing classes

Dummy java.lang.Class and java.lang.ClassLoader classes are added if not found in the dump

HPROF unload class

The HPROF parser now forgets names of classes on seeing a unload record. It seemed tidy to add this, but shouldn't make any difference as dumps shouldn't refer to unloaded classes.

Index validation

Added a couple more tests to check the index for GC roots produced by a parser.

ArgumentDescriptor

Print out commands with a space before a default heap object argument not at the beginning of the argument list - so the export_hprof command works.

ArrayIntCompressed ArrayLongCompressed

These work, but prior to my change expected that set(index,value) was only called once per index value. The new value was or'ed with the old value.

When my coding changes for types of classes I wondered why the type wasn't being changed correctly, and discovered this behaviour. It wasn't documented, so I presumed it was a bug. I fixed it so that set could be called to change existing values too. This seemed more sensible than trying to document the existing behaviour. I then wrote some tests for this.

HeapDumpInfo

This query now has a SnapshotInfo argument, not ISnapshot. The value is supplied automatically by the context so makes no different for the ordinary user. It was useful to supply info about the newly dumped HPROF file for the export hprof query.

FieldDescriptor

This now has a toString, like Field, with a note to explain not to rely on the format.

Calculate dominator tree

The snapshot is closed if on parsing the dominator tree step fails. This allows the user to delete the indices if required.




Andrew Johnson





Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Back to the top