Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] Parallel parsing

Jason,

I've been away and then making minor fixes for a 1.9.1 release for Eclipse 2019-09,
https://wiki.eclipse.org/SimRel/2019-09/Simultaneous_Release_Plan#Schedule
Krum may be able to add a new version by tomorrow, 28 August 2019, the 2019-09 M3 +3 deadline.

To keep things simple we'll need to wait until 2019-09 is done before doing parallel parsing changes.

My thoughts on requirements:

Versioning
https://wiki.eclipse.org/Version_Numbering
I don't want breaking API changes for parallel parsing, so for 'significant performance changes' it sounds like a minor version change, so version 1.10

API compatibility
https://wiki.eclipse.org/Evolving_Java-based_APIs
With a minor version number change we can add to the API, but this needs to be done carefully as once added we can't easily change/remove. Initially avoid API changes, or mark them as non-API in the JavaDoc until we are happy they are useful.

Java version
I'm happy to move up to Java 8 - stand-alone MAT already requires this as it is based on Eclipse Photon. This would stop MAT features from running in a Eclipse with a Java 7 SDK, for example IBM Rational Team Concert 6.0.6 (based on Luna?) https://jazz.net/downloads/rational-team-concert/releases/6.0.6 However it might be possible to switch to a newer JVM, or install the RTC components into a newer supported Eclipse with a newer JVM. I don't think that is going to be a problem.

CPU/RAM trade-offs
I see three main use cases - interactive MAT on a developer's machine with 8-32GB heap, either handling local OutOfMemoryErrors or heap dumps from other systems and batch mode processing on a server with perhaps 64GB+. For the first we should be able to handle most dumps created on the same machine. For the second and third we need to handle larger dumps if possible, and not reduce the size of dump we can handle by much, but what is much? Currently we think we need 28.25 bytes per object to build the dominator tree, but there may be other limits based on large object arrays. For parsing an object array we should see how local arrays we create as this could be done on each processing thread. If there are duplicate references in an object array the actual outbound list will be small. If they different then we already need to have enough heap to process those objects later, but that might not be enough for the earlier stages. E.g. 16 threads each processing a different large object array with a couple of copies of the array as object addresses and object IDs could use 16*L*(4+8) bytes, which could be more than we needed for dominator tree processing.

The performance regression test can now be used to find the smallest JVM -Xmx value that works for a heap dump. I'll start by using a large dump from running Eclipse and another with 100 large object arrays. The dumps are too big to store in Git. This could show typical memory usage, but perhaps not worst case problems.

CPU is less of a problem - people are prepared to wait for parsing if the results are useful.

Initially I think we should just experiment with the JVM flag to restrict the number of threads. If we need it often then it could be a preference.

Handling OOME might be possible, but as they can occur nearly anywhere and not necessarily on the greedy thread it might not work. ObjectMarker now detects worker thread failures (important) and attempts to move unfinished work back onto the global stack (nice to have). For large array allocations / work we could have some resource limitations to stop all the threads requesting large object arrays at once. Another idea is to catch a OOME on parsing, then optionally? retry parsing with a single thread.

Merging changes
The Eclipse legal process required approval via Contribution Questionnaire (CQ)  19463, which was approved. Therefore we should start with the approved version of the changes, and apply updates upon that. We could have a Git feature branch, but I don't know how to apply existing Gerrit changes to that. We could just work on master as there isn't much other work going on. Another idea is to apply the changes to master, make the resulting head a feature branch, make the original point the master branch, then when the feature is complete apply the feature changes to master (and squash the commits?). Is this how Git can work?

Andrew




From:        Jason Koch <jkoch@xxxxxxxxxxx>
To:        Memory Analyzer Dev list <mat-dev@xxxxxxxxxxx>
Date:        26/08/2019 18:57
Subject:        Re: [mat-dev] Parallel parsing
Sent by:        mat-dev-bounces@xxxxxxxxxxx




Hi Andrew

I propose the following:
- Decide on clear goals in terms of API compatibilty, Java version compatibility, and CPU/RAM tradeoff goals.
- Break down each of the patches and bring forward those that are eligible one by one, and ~in change order. I did put break the branch into each logical group of changes per commit and push that. This should be able to form the basis of a smaller set of changes.

I'd be very happy to get started on this, assuming we can clear up the API/Java/RAM questions:

There are a few possible solutions which will have different tradeoffs. We could have a simple solution like we do with heap size usage - allow the user to control it with a JVM flag:
     -Djava.util.concurrent.ForkJoinPool.common.parallelism=X
I genuinely do not have not a good handle on the MAT ecosystem and need to be guided by community feedback in terms of boundaries to sensible solutions, especially in the heap usage area. Would controlling parallelism with a JVM flag be sufficient?

Our use case is typically the exact opposite in that the more resources we can use, and faster, the better, since we can release those resources back to the pool anyway.

For background on why I created a single PR: I wanted to make it easier to understand the overall impact and justify the value of the changes as a group, since many of the individual changes don't have a significant impact on their own. If this is no longer useful, we should just throw it away and start on a fresh branch/es.

Thanks
Jason

On Fri, Jul 19, 2019 at 9:59 AM Andrew Johnson <andrew_johnson@xxxxxxxxxx> wrote:
Now we have released Memory Analyzer 1.9 it seems a good time to look at parallel parsing again.

I've applied the patches locally and then tried fixing the API compatibility errors.

API baseline is set up using Preferences > Plug-in Development > API Baselines and I used a Memory Analyzer 1.9 installation as the baseline.

https://wiki.eclipse.org/MemoryAnalyzer/Contributor_Reference#Configure_API_Tooling_Baseline

We have to fix the breaking changes.


IntIndexCollector

LongIndexCollector

move these back to org.eclipse.mat.parser.index.IndexWriter (or have a stub and change the new classes to package-private).

Also add back 'extends IntIndex<ArrayIntCompressed>' and 'extends LongIndex', and possibly merge properly and use the fields in the superclasses.


The type org.eclipse.mat.parser.io.PositionInputStream has been changed from a class to an interface


Put back PositionInputStream and add a new interface.


The other errors are:

Missing @since tag on org.eclipse.mat.parser.io.BufferingRafPositionInputStream

Missing @since tag on org.eclipse.mat.parser.io.ByteArrayPositionInputStream

Missing @since tag on removeInstanceBulk(int, long)

Missing @since tag on org.eclipse.mat.parser.io.DefaultPositionInputStream

The major version should be incremented in version 1.9.0, since API breakage occurred since version 1.9.0


We could fix these by increasing the version to 1.10.The possible problem with this is that we need to update the version early in the release cycle, and so head/master isn't suitable as a service stream. We could solve this using branches but I would like to keep things simple as I don't have much time to spend on the project.


The other approach is to avoid all API changes. We move the PositionInputStream classes into o.e.mat.hprof which not an API project, so is easy to add to. We replace uses of removeInstanceBulk with a loop over removeInstance, and check it isn't a big performance problem. We then have the parallel parsing changes without any API changes. Later we can consider whether these classes are mature enough and useful enough to move to o.e.mat.parser and promote them to being APIs.


I have not applied multiple Gerrit patches before. I am not even sure of the right way of testing; I just merged the branches in order into my master. What is the best way - to modify existing patches with changes, which is more logical, but complicated (and I would need help), or to apply the existing changes, then create some more patches on top?


Any comments?


Andrew Johnson



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Back to the top