Re: [mat-dev] Storage targets for Eclipse MAT parsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [mat-dev] Storage targets for Eclipse MAT parsing

From: Andrew Johnson <andrew_johnson@xxxxxxxxxx>
Date: Wed, 6 Jan 2021 22:49:24 +0000
Delivered-to: mat-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/mat-dev>
List-help: <mailto:mat-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/mat-dev>, <mailto:mat-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/mat-dev>, <mailto:mat-dev-request@eclipse.org?subject=unsubscribe>

> Hi,
> > I have a question on hardware platform targeting for Eclipse MAT > users. Specifically, do we think it is OK to target SSD/NVME devices > with very low seek times for parsing stage, or should we also target > spinning disk with a heavy preference for sequential access?
> > I mentioned doing some more performance work to Andrew, and he > suggested Pass 1 parsing is not yet concurrent, which is a good > point. At a guess, I think if we can parse different hprof segments > concurrently, I think we can very efficiently parse large files > across multiple cores. The tradeoff would be that this will be > reading multiple different regions of the file separately and likely > require physical seek on large volumes. On NVME/SSD this might be > acceptable but I think for spinning disk this would be a large penalty.
> > I suspect the answer will come down to a tunable option, however the > ideal to me requires only a single implementation rather than > multiple implementations. For example, an ideal solution might be: > if you have a spinning disk, parse in one thread, using the same > algorithm as the multi threaded approach. The question then is, how > do we detect for the user that we should use single threaded for the > parsing phase in the event the user has multiple cores available.
> > Please let me know your thoughts and/or if you have any > implementation suggestions.
> > Thanks
> Jason
I don't have answers for this - I'm normally using my laptop.

There's also the question of whether people are running MAT on virtual
machines - and are the disks virtualized to some big disk array
somewhere?

There is also a contribution for chunked gzipped files:https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/173722which might allow efficient multiple streams for pass 1 parsing even
for gzipped files. The CPU requirements for unzipping might be the
limiting factor, even with multiple reads on a single disk.

There is also the minor problem of the HPROF format limited to 4GB
heap segments - but huge arrays can exceed that. I think the current
VMs just omit or truncate those arrays. MAT attempts to allow
heap segments to overflow their stated size, but if a VM
generated an HPROF file in that way then it would break parallel
pass 1 parsing.

Andrew Johnson

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

References:
- [mat-dev] Storage targets for Eclipse MAT parsing
  - From: Jason Koch

Prev by Date: Re: [mat-dev] GarbageCleaner phase code
Next by Date: Re: [mat-dev] GarbageCleaner phase code
Previous by thread: [mat-dev] Storage targets for Eclipse MAT parsing
Next by thread: [mat-dev] MAT Moving to EPL-2.0
Index(es):
- Date
- Thread

Breadcrumbs