571331 – Reduce memory footprint of pass 1 heapdump loading

Bug 571331 - Reduce memory footprint of pass 1 heapdump loading

Summary: Reduce memory footprint of pass 1 heapdump loading

Status:	CLOSED MOVED

Alias:	None

Product:	MAT
Classification:	Tools
Component:	Core (show other bugs)
Version:	1.11
Hardware:	All All

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	Project Inbox
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2021-02-19 03:58 EST by Jason Koch
Modified:	2024-05-08 15:47 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
Identifier Index patch (7.13 KB, patch) 2021-02-19 04:11 EST, Jason Koch	no flags	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jason Koch

2021-02-19 03:58:04 EST

The Identifier index created during Pass 1 parsing currently uses a single linear array, and grows as required by allocating arrays at 1.5x and copying all values. A consequence of this is that, for a given memory footprint (eg: 1GB + one object), the required parsing space is (1 + 1*1.5 =) 2.5GB of heap.

Instead of growing by 1.5x, it would use less memory during Pass 1 parsing if memory was allocated and extended as needed, with no copying to an extended array.

Comment 1 Jason Koch

2021-02-19 04:11:01 EST

Created attachment 285606 [details]
Identifier Index patch

Comment 2 Andrew Johnson

2021-02-19 11:20:40 EST

Thanks for the contribution.

1. Is this a real problem in that is pass 1 the limiting factor for being able to successfully parse a dump, or will a later stage always
fail first - e.g. the dominator tree which allocates at least 6 int arrays each of size of the number of identifiers?
2. The parallel sort is replaced by a special Quick sort - is this significantly slower?
3. Could a pathological / malicious dump trigger n-squared sort time?

Comment 3 Jason Koch

2021-03-25 11:02:27 EDT

Andrew, this is a lead-in to some follow-up changes; the goal being that we can likely eliminate the pass1 static footprint requirement at initial parse time. The second chunk is allowing pass1 parsing to be shifted to disk before starting pass2. Net outcome is a reduction of the fixed pass1 memory requirements during parsing. The main challenge is doing it without breaking the API.

I'm happy to withdraw this, I think there's probably a better way to tackle it by deferring to some alternative IO model. This would be something like writing directly through to disk for some of the Array* models but this needs to be balanced against IO costs. I've yet to really think about this in detail but it seems like this would allow for significantly reduced parsing memory requirements.

Comment 4 Andrew Johnson

2022-05-13 08:54:04 EDT

I made a basic change to address some of this problem:
https://git.eclipse.org/c/mat/org.eclipse.mat.git/diff/plugins/org.eclipse.mat.parser/src/org/eclipse/mat/parser/index/IndexWriter.java?id=23f80cc96a622bdc1dc886fdab5cbc69c9f3c945
This uses an ArrayLongBig to accumulate the longs in a series of arrays.
Before a sort the values are then copied to a long[], so it does require approximately 2N values, rather than up to 2.5N for before this change.
The attachment 285606 [details] shows how it could be done in N values, but with
a more complex sort of the subarrays.

I notice that the class library for Java 17 for Arrays.parallelSort() makes
a copy of the array, whereas Java 11 did not. This means that even if the
conversion of ArrayLongBig to long[] were done via a file so that only N values
were kept in memory at once, the parallelSort would require another N.

If the heap space for Identifier index is still a problem in pass 1 then we can look at this patch again.

Comment 5 Eclipse Webmaster

2024-05-08 15:47:18 EDT

This issue has been migrated to https://github.com/eclipse-mat/org.eclipse.mat/issues/32.