Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] Storage targets for Eclipse MAT parsing

> Hi,
>
> I have a question on hardware platform targeting for Eclipse MAT
> users. Specifically, do we think it is OK to target SSD/NVME devices
> with very low seek times for parsing stage, or should we also target
> spinning disk with a heavy preference for sequential access?

>
> I mentioned doing some more performance work to Andrew, and he
> suggested Pass 1 parsing is not yet concurrent, which is a good
> point. At a guess, I think if we can parse different hprof segments
> concurrently, I think we can very efficiently parse large files
> across multiple cores. The tradeoff would be that this will be
> reading multiple different regions of the file separately and likely
> require physical seek on large volumes. On NVME/SSD this might be
> acceptable but I think for spinning disk this would be a large penalty.

>
> I suspect the answer will come down to a tunable option, however the
> ideal to me requires only a single implementation rather than
> multiple implementations. For example, an ideal solution might be:
> if you have a spinning disk, parse in one thread, using the same
> algorithm as the multi threaded approach. The question then is, how
> do we detect for the user that we should use single threaded for the
> parsing phase in the event the user has multiple cores available.

>
> Please let me know your thoughts and/or if you have any
> implementation suggestions.

>
> Thanks

> Jason
I don't have answers for this - I'm normally using my laptop.

There's also the question of whether people are running MAT on virtual
machines - and are the disks virtualized to some big disk array
somewhere?

There is also a contribution for chunked gzipped files:
https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/173722
which might allow efficient multiple streams for pass 1 parsing even

for gzipped files. The CPU requirements for unzipping might be the
limiting factor, even with multiple reads on a single disk.

There is also the minor problem of the HPROF format limited to 4GB
heap segments - but huge arrays can exceed that. I think the current
VMs just omit or truncate those arrays. MAT attempts to allow
heap segments to overflow their stated size, but if a VM
generated an HPROF file in that way then it would break parallel
pass 1 parsing.

Andrew Johnson




Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Back to the top