Community
Participate
Working Groups
Created attachment 289069 [details] Stack trace Exception occurs after parsing ~25% of a 45 GB heap dump. Unfortunately I can't share the heap dump. Error happens on both 1.14 and 1.13. Hardware is Apple M2 Max. Heap dump doesn't seem corrupt as it's correctly parsed on an Intel Mac and Fedora Linux. Seems related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=579931.
Is that stack trace with the latest development code? java.lang.ArrayIndexOutOfBoundsException: Index -3489970 out of bounds for length 3750002 at org.eclipse.mat.collect.ArrayIntCompressed.set(ArrayIntCompressed.java:147) at org.eclipse.mat.parser.index.IndexWriter$IntIndexCollector.set(IndexWriter.java:712) at org.eclipse.mat.hprof.HprofParserHandlerImpl.beforePass2(HprofParserHandlerImpl.java:347) at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:91) That would correspond to the line: object2classId.set(clazz.getObjectId(), clazz.getClazz().getObjectId()); It is probably not a timing issue as it occurred twice (with 1.14 and 1.13). Also pass 1 and beforePass2 are single threaded. The error would suggest that the object ID for the class was not found in // calculate instance size for all classes ClassImpl clazz = e.next(); int index = identifiers0.reverse(clazz.getObjectAddress()); clazz.setObjectId(index); I would ask whether it occurs with MAT 1.11.0 as some of the indexing logic has changed since then, but I see that there is no Mac/Cocoa/AArch64 version 1.11 of MAT, and I note the problem does not occur with x86_64 Linux and Mac.
This could take a while to debug. Do you still have the dump and are you willing to try various test builds of MAT? There's a possibility that it is a JVM/JIT error, so have you tried updating your JVM? Does it occur with an Eclipse that runs on Mac M2, but with MAT 1.11 installed from https://download.eclipse.org/mat/1.11.0/update-site/ That might show whether some of the index changes were responsible in bug 579931 or bug 573258. [though if it was a JIT bug then small code changes can be enough to hide the bug]. I could add some debugging code which reported anything unusual earlier - but then I would need you to run a snapshot build and report the results, probably several times as I modified the code. I have spotted a minor bug where there is an iterator over classesByAddress and in the loop a value is changed via a put on an existing index. According to the usual definitions of collections that should be safe, but if the MAT collection is at the size limit before resizing then it is resized even though no resize is needed as the collection doesn't get any bigger. This could then mess up the iterator. That code should behave the same way on x86-64 though, so might not be the problem.
Thank you for looking into this. I still have the heap dump and I'm happy to help. I've experimented with different JVMs and have gone through (at least) 17.0.6, 17.0.7 and 20.0.1. > Does it occur with an Eclipse that runs on Mac M2, but with MAT 1.11 installed from https://download.eclipse.org/mat/1.11.0/update-site/ That might show whether some of the index changes were responsible in bug 579931 or bug 573258. [though if it was a JIT bug then small code changes can be enough to hide the bug]. I just tried 1.11 through Eclipse and the error still occur, with the exact same error message. I did try running x86-64 builds of MAT 1.11 and 1.14 on my Mac M2 through Rosetta 2 and in both cases MAT successfully parsed the heap dump.
Changes to fix the collections (might not fix this bug though) https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202110
Add extra error message - won't fix the problem but may give more information. https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202118 https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=54651070d185e9c369e1fab303fb5b2e6e3b298d
Test case fix https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202119 https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=e592bfde54695bfe6db6a91baf21fb1c9d429156
A snapshot build is now available with the collections fix and a bit more logging of errors. I don't think it will fix the problem, but please retest, and report the error log here. https://www.eclipse.org/mat/snapshotBuilds.php Thanks
I tried the snapshot build and as you suspected, the problem persists. The additional logging output 33 new error messages. Roughly a third of those looked like this: ``` !ENTRY org.eclipse.mat.ui 4 0 2023-05-26 10:11:13.830 !MESSAGE class jdk.internal.reflect.GeneratedMethodAccessor22341 @ 0x7f4c6749c450 not found in address index ``` The rest of the errors concerned a class of our own, which is generated and loaded at runtime containing generated bytecode.
I still can't see how the problem could happen, and would welcome someone else to inspect the code. The reason for the problem is that some classes in classesByAddress have an address which is not found by a reverse lookup in indentifiers0. However, when I look through the code I see that every time a class is added to classesByAddress the address is also added to identifiers0. Aarch64/Arm64 has some differences to x86_64 - e.g. the writes are weakly ordered. That shouldn't make a difference for Java programs which correctly follow the Java memory model. I think it would be worth trying a different JVM, in case there is a bug in the JVM / JIT. Are you able to try an IBM Semeru JDK, which is based on OpenJ9? https://developer.ibm.com/languages/java/semeru-runtimes/downloads/ Also, the identifiers0 index is sorted before doing binary lookups. This uses the Arrays.parallelSort methods. There is this bug: https://bugs.openjdk.org/browse/JDK-8076446 (array) Arrays.parallelSort is not stable That doesn't directly apply to sorting int or long arrays, but makes me a bit suspicious of the method, so I have tried sorting the array with an ordinary sort() after the parallelSort(). It should do nothing assuming the parallel sort works, and not be too slow if the array is sorted, but could fix a parallel sort, if the only error was just items in the wrong order, rather than items being omitted or duplicated. https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202438
Gustav, have you had a chance to try the ideas in comment 9? 1. Try an IBM Semeru Runtime for macOS aarch64 2. Try the latest development build - which does an ordinary sort() after parallelSort() in case parallel sort is broken.
This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/38.