Bug 581932 - ArrayIndexOutOfBoundsException in ArrayIntCompressed on beforePass2 parsing
Summary: ArrayIndexOutOfBoundsException in ArrayIntCompressed on beforePass2 parsing
Status: CLOSED MOVED
Alias: None
Product: MAT
Classification: Tools
Component: Core (show other bugs)
Version: 1.14   Edit
Hardware: Macintosh Mac OS X
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Andrew Johnson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-12 04:08 EDT by Gustav Hedengran CLA
Modified: 2024-05-08 16:57 EDT (History)
1 user (show)

See Also:


Attachments
Stack trace (2.94 KB, text/plain)
2023-05-12 04:08 EDT, Gustav Hedengran CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gustav Hedengran CLA 2023-05-12 04:08:10 EDT
Created attachment 289069 [details]
Stack trace

Exception occurs after parsing ~25% of a 45 GB heap dump. Unfortunately I can't share the heap dump. 

Error happens on both 1.14 and 1.13. Hardware is Apple M2 Max.

Heap dump doesn't seem corrupt as it's correctly parsed on an Intel Mac and Fedora Linux.

Seems related to https://bugs.eclipse.org/bugs/show_bug.cgi?id=579931.
Comment 1 Andrew Johnson CLA 2023-05-12 08:50:30 EDT
Is that stack trace with the latest development code?

java.lang.ArrayIndexOutOfBoundsException: Index -3489970 out of bounds for length 3750002
	at org.eclipse.mat.collect.ArrayIntCompressed.set(ArrayIntCompressed.java:147)
	at org.eclipse.mat.parser.index.IndexWriter$IntIndexCollector.set(IndexWriter.java:712)
	at org.eclipse.mat.hprof.HprofParserHandlerImpl.beforePass2(HprofParserHandlerImpl.java:347)
	at org.eclipse.mat.hprof.HprofIndexBuilder.fill(HprofIndexBuilder.java:91)

That would correspond to the line:
            object2classId.set(clazz.getObjectId(), clazz.getClazz().getObjectId());

It is probably not a timing issue as it occurred twice (with 1.14 and 1.13). Also pass 1 and beforePass2 are single threaded.

The error would suggest that the object ID for the class was not found in // calculate instance size for all classes
            ClassImpl clazz = e.next();
            int index = identifiers0.reverse(clazz.getObjectAddress());
            clazz.setObjectId(index);

I would ask whether it occurs with MAT 1.11.0  as some of the indexing logic has changed since then, but I see that there is no Mac/Cocoa/AArch64 version 1.11 of MAT, and I note the problem does not occur with x86_64 Linux and Mac.
Comment 2 Andrew Johnson CLA 2023-05-23 15:36:09 EDT
This could take a while to debug. Do you still have the dump and are you willing to try various test builds of MAT?

There's a possibility that it is a JVM/JIT error, so have you tried updating your JVM?

Does it occur with an Eclipse that runs on Mac M2, but with MAT 1.11 installed from https://download.eclipse.org/mat/1.11.0/update-site/ That might show whether some of the index changes were responsible in bug 579931 or bug 573258. [though if it was a JIT bug then small code changes can be enough to hide the bug].

I could add some debugging code which reported anything unusual earlier - but then I would need you to run a snapshot build and report the results, probably several times as I modified the code.

I have spotted a minor bug where there is an iterator over classesByAddress and in the loop a value is changed via a put on an existing index. According to the usual definitions of collections that should be safe, but if the MAT collection is at the size limit before resizing then it is resized even though no resize is needed as the collection doesn't get any bigger. This could then mess up the iterator. That code should behave the same way on x86-64 though, so might not be the problem.
Comment 3 Gustav Hedengran CLA 2023-05-24 04:42:04 EDT
Thank you for looking into this.

I still have the heap dump and I'm happy to help. I've experimented with different JVMs and have gone through (at least) 17.0.6, 17.0.7 and 20.0.1.

> Does it occur with an Eclipse that runs on Mac M2, but with MAT 1.11 installed from https://download.eclipse.org/mat/1.11.0/update-site/ That might show whether some of the index changes were responsible in bug 579931 or bug 573258. [though if it was a JIT bug then small code changes can be enough to hide the bug].

I just tried 1.11 through Eclipse and the error still occur, with the exact same error message.

I did try running x86-64 builds of MAT 1.11 and 1.14 on my Mac M2 through Rosetta 2 and in both cases MAT successfully parsed the heap dump.
Comment 4 Andrew Johnson CLA 2023-05-26 03:24:53 EDT
Changes to fix the collections (might not fix this bug though)
https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202110
Comment 5 Andrew Johnson CLA 2023-05-26 03:45:13 EDT
Add extra error message - won't fix the problem but may give more information.
https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202118
https://git.eclipse.org/c/mat/org.eclipse.mat.git/commit/?id=54651070d185e9c369e1fab303fb5b2e6e3b298d
Comment 7 Andrew Johnson CLA 2023-05-26 06:27:14 EDT
A snapshot build is now available with the collections fix and a bit more logging of errors. I don't think it will fix the problem, but please retest, and report the error log here.
https://www.eclipse.org/mat/snapshotBuilds.php
Thanks
Comment 8 Gustav Hedengran CLA 2023-06-05 05:12:04 EDT
I tried the snapshot build and as you suspected, the problem persists. The additional logging output 33 new error messages. Roughly a third of those looked like this:

```
!ENTRY org.eclipse.mat.ui 4 0 2023-05-26 10:11:13.830
!MESSAGE class jdk.internal.reflect.GeneratedMethodAccessor22341 @ 0x7f4c6749c450 not found in address index
```

The rest of the errors concerned a class of our own, which is generated and loaded at runtime containing generated bytecode.
Comment 9 Andrew Johnson CLA 2023-06-12 03:35:41 EDT
I still can't see how the problem could happen, and would welcome someone else to inspect the code.
The reason for the problem is that some classes in classesByAddress have an address which is not found by a reverse lookup in indentifiers0.
However, when I look through the code I see that every time a class is added to classesByAddress the address is also added to identifiers0.

Aarch64/Arm64 has some differences to x86_64 - e.g. the writes are weakly ordered. That shouldn't make a difference for Java programs which correctly follow the Java memory model.

I think it would be worth trying a different JVM, in case there is a bug in the JVM / JIT. Are you able to try an IBM Semeru JDK, which is based on OpenJ9?

https://developer.ibm.com/languages/java/semeru-runtimes/downloads/ 

Also, the identifiers0 index is sorted before doing binary lookups. This uses the Arrays.parallelSort methods. There is this bug:
https://bugs.openjdk.org/browse/JDK-8076446 (array) Arrays.parallelSort is not stable
That doesn't directly apply to sorting int or long arrays, but makes me a bit suspicious of the method, so I have tried sorting the array with an ordinary sort() after the parallelSort(). It should do nothing assuming the parallel sort works, and not be too slow if the array is sorted, but could fix a parallel sort, if the only error was just items in the wrong order, rather than items being omitted or duplicated.

https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/202438
Comment 10 Andrew Johnson CLA 2023-06-29 05:04:50 EDT
Gustav, have you had a chance to try the ideas in comment 9?
1. Try an IBM Semeru Runtime for macOS aarch64
2. Try the latest development build - which does an ordinary sort() after parallelSort() in case parallel sort is broken.
Comment 11 Eclipse Webmaster CLA 2024-05-08 16:57:28 EDT
This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/38.