Bug 571331 - Reduce memory footprint of pass 1 heapdump loading
Summary: Reduce memory footprint of pass 1 heapdump loading
Status: CLOSED MOVED
Alias: None
Product: MAT
Classification: Tools
Component: Core (show other bugs)
Version: 1.11   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-19 03:58 EST by Jason Koch CLA
Modified: 2024-05-08 15:47 EDT (History)
1 user (show)

See Also:


Attachments
Identifier Index patch (7.13 KB, patch)
2021-02-19 04:11 EST, Jason Koch CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Jason Koch CLA 2021-02-19 03:58:04 EST
The Identifier index created during Pass 1 parsing currently uses a single linear array, and grows as required by allocating arrays at 1.5x and copying all values. A consequence of this is that, for a given memory footprint (eg: 1GB + one object), the required parsing space is (1 + 1*1.5 =) 2.5GB of heap.

Instead of growing by 1.5x, it would use less memory during Pass 1 parsing if memory was allocated and extended as needed, with no copying to an extended array.
Comment 1 Jason Koch CLA 2021-02-19 04:11:01 EST
Created attachment 285606 [details]
Identifier Index patch
Comment 2 Andrew Johnson CLA 2021-02-19 11:20:40 EST
Thanks for the contribution.

1. Is this a real problem in that is pass 1 the limiting factor for being able to successfully parse a dump, or will a later stage always
fail first - e.g. the dominator tree which allocates at least 6 int arrays each of size of the number of identifiers?
2. The parallel sort is replaced by a special Quick sort - is this significantly slower?
3. Could a pathological / malicious dump trigger n-squared sort time?
Comment 3 Jason Koch CLA 2021-03-25 11:02:27 EDT
Andrew, this is a lead-in to some follow-up changes; the goal being that we can likely eliminate the pass1 static footprint requirement at initial parse time. The second chunk is allowing pass1 parsing to be shifted to disk before starting pass2. Net outcome is a reduction of the fixed pass1 memory requirements during parsing. The main challenge is doing it without breaking the API.

I'm happy to withdraw this, I think there's probably a better way to tackle it by deferring to some alternative IO model. This would be something like writing directly through to disk for some of the Array* models but this needs to be balanced against IO costs. I've yet to really think about this in detail but it seems like this would allow for significantly reduced parsing memory requirements.
Comment 4 Andrew Johnson CLA 2022-05-13 08:54:04 EDT
I made a basic change to address some of this problem:
https://git.eclipse.org/c/mat/org.eclipse.mat.git/diff/plugins/org.eclipse.mat.parser/src/org/eclipse/mat/parser/index/IndexWriter.java?id=23f80cc96a622bdc1dc886fdab5cbc69c9f3c945
This uses an ArrayLongBig to accumulate the longs in a series of arrays.
Before a sort the values are then copied to a long[], so it does require approximately 2N values, rather than up to 2.5N for before this change.
The attachment 285606 [details] shows how it could be done in N values, but with
a more complex sort of the subarrays.

I notice that the class library for Java 17 for Arrays.parallelSort() makes
a copy of the array, whereas Java 11 did not. This means that even if the
conversion of ArrayLongBig to long[] were done via a file so that only N values
were kept in memory at once, the parallelSort would require another N.

If the heap space for Identifier index is still a problem in pass 1 then we can look at this patch again.
Comment 5 Eclipse Webmaster CLA 2024-05-08 15:47:18 EDT
This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/32.