Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] Parallel Object Discovery and Outbound and Inbound Index Creation

"mat-dev" <mat-dev-bounces@xxxxxxxxxxx> wrote on 09/11/2021 22:16:35:
> From: "Nathan Reynolds"
> Please enlighten me as to why 
> this is not done.
Thank you for your interest in improving MAT. One main reason things have 
not been done is that it is no one's full time job to maintain MAT and so 
we depend on committers and contributors having some spare time to help 
and their employers accepting that.
In my mind, performance improvements, while nice, come some way down the 
priority list, especially multi-threaded code which is more difficult to 
verify for correct behaviour. Jason Koch has and is doing some great work 
on making MAT work in parallel and has some more ideas:
570670: Optimisations for GarbageCleaner
https://bugs.eclipse.org/bugs/show_bug.cgi?id=570670
571331: Reduce memory footprint of pass 1 heapdump loading
https://bugs.eclipse.org/bugs/show_bug.cgi?id=571331
572512: Memory mapped files for parsing storage (proposal for comment)
https://bugs.eclipse.org/bugs/show_bug.cgi?id=572512

> During the initial pass, I see that MAT will use up to 10 cores.  It
> seems that the disk read bandwidth is the limit.  I am guessing that
> MAT is reading each object and could build the outbound and inbound 
> index using the object's reference fields.  By creating these 
> indexes during the initial pass, the total time for parsing a heap 
> dump would be tremendously reduced.  
The HPROF and DTFJ parsers have two passes. The first pass 'Scanning ...' 
finds all the objects and classes and builds the address to index and 
index to address mappings and the class objects.
The second pass, which takes longer, looks at all the outbound address 
references, converts them to identifiers and builds the outbound and 
inbound indices. This phase is already multi-threaded for HPROF, and needs 
the result of the first phase so the phases can't be combined.
There is then a phase in common code which removes garbage (unreachable 
objects), and then changes all the identifiers. This is multi-threaded.
Then the dominator tree is built.

It would be possible to do the first pass in parallel as an HPROF file is 
split into heap dump segments of less than 4GB and so the chunks could be 
done in parallel. Inside each chunk is a sequence of sub-tags of variable 
length so that is harder to do in parallel.

Jason might have some recent figures for the memory and time requirements 
for each phase for huge dumps.

--
Andrew Johnson



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 
741598. 
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU



Back to the top