Anderw - thanks for clarifications.
Re: CPU/RAM
I do agree on the large array constraints. I think perhaps a semaphore of sorts to restrict the number of threads that can be processing large arrays might be useful. Alternatively I could look at streaming in batches. It's been a while since I've looked at the code; there may be other solutions too.
Re: patches/legal. I'm not familiar with Eclipse but I think the following might work.
Given the contents of the branch have been approved for inclusion, we can start a loop:
- create a new branch off master
- cherry pick $next commit from the CQ approved branch
- apply any fixups or changes required over the top of the branch
- merge branch into master
- repeat
This would allow us to progress in the right direction with smaller changes, and we are taking in the code that was approved. Any fixes over the top are applied incrementally.
Thoughts?
Jason
Jason,
I've been away
and then making minor fixes for a 1.9.1 release for Eclipse 2019-09,
https://wiki.eclipse.org/SimRel/2019-09/Simultaneous_Release_Plan#Schedule
Krum may be able
to add a new version by tomorrow, 28 August 2019, the 2019-09 M3 +3 deadline.
To keep things
simple we'll need to wait until 2019-09 is done before doing parallel parsing
changes.
My thoughts on
requirements:
Versioning
https://wiki.eclipse.org/Version_Numbering
I don't want breaking
API changes for parallel parsing, so for 'significant performance changes'
it sounds like a minor version change, so version 1.10
API compatibility
https://wiki.eclipse.org/Evolving_Java-based_APIs
With a minor version
number change we can add to the API, but this needs to be done carefully
as once added we can't easily change/remove. Initially avoid API changes,
or mark them as non-API in the JavaDoc until we are happy they are useful.
Java version
I'm happy to move
up to Java 8 - stand-alone MAT already requires this as it is based on
Eclipse Photon. This would stop MAT features from running in a Eclipse
with a Java 7 SDK, for example IBM Rational Team Concert 6.0.6 (based on
Luna?) https://jazz.net/downloads/rational-team-concert/releases/6.0.6
However it might be possible to switch to a newer JVM, or install the RTC
components into a newer supported Eclipse with a newer JVM. I don't think
that is going to be a problem.
CPU/RAM trade-offs
I see three main
use cases - interactive MAT on a developer's machine with 8-32GB heap,
either handling local OutOfMemoryErrors or heap dumps from other systems
and batch mode processing on a server with perhaps 64GB+. For the first
we should be able to handle most dumps created on the same machine. For
the second and third we need to handle larger dumps if possible, and not
reduce the size of dump we can handle by much, but what is much? Currently
we think we need 28.25 bytes per object to build the dominator tree, but
there may be other limits based on large object arrays. For parsing an
object array we should see how local arrays we create as this could be
done on each processing thread. If there are duplicate references in an
object array the actual outbound list will be small. If they different
then we already need to have enough heap to process those objects later,
but that might not be enough for the earlier stages. E.g. 16 threads each
processing a different large object array with a couple of copies of the
array as object addresses and object IDs could use 16*L*(4+8) bytes, which
could be more than we needed for dominator tree processing.
The performance
regression test can now be used to find the smallest JVM -Xmx value that
works for a heap dump. I'll start by using a large dump from running Eclipse
and another with 100 large object arrays. The dumps are too big to store
in Git. This could show typical memory usage, but perhaps not worst case
problems.
CPU is less of
a problem - people are prepared to wait for parsing if the results are
useful.
Initially I think
we should just experiment with the JVM flag to restrict the number of threads.
If we need it often then it could be a preference.
Handling OOME
might be possible, but as they can occur nearly anywhere and not necessarily
on the greedy thread it might not work. ObjectMarker now detects worker
thread failures (important) and attempts to move unfinished work back onto
the global stack (nice to have). For large array allocations / work we
could have some resource limitations to stop all the threads requesting
large object arrays at once. Another idea is to catch a OOME on parsing,
then optionally? retry parsing with a single thread.
Merging changes
The Eclipse legal
process required approval via Contribution Questionnaire (CQ) 19463,
which was approved. Therefore we should start with the approved version
of the changes, and apply updates upon that. We could have a Git feature
branch, but I don't know how to apply existing Gerrit changes to that.
We could just work on master as there isn't much other work going on. Another
idea is to apply the changes to master, make the resulting head a feature
branch, make the original point the master branch, then when the feature
is complete apply the feature changes to master (and squash the commits?).
Is this how Git can work?
Andrew
From:
Jason
Koch <jkoch@xxxxxxxxxxx>
To:
Memory
Analyzer Dev list <mat-dev@xxxxxxxxxxx>
Date:
26/08/2019
18:57
Subject:
Re:
[mat-dev] Parallel parsing
Sent
by: mat-dev-bounces@xxxxxxxxxxx
Hi Andrew
I propose the following:
- Decide on clear goals in terms of API
compatibilty, Java version compatibility, and CPU/RAM tradeoff goals.
- Break down each of the patches and
bring forward those that are eligible one by one, and ~in change order.
I did put break the branch into each logical group of changes per commit
and push that. This should be able to form the basis of a smaller set of
changes.
I'd be very happy to get started on this,
assuming we can clear up the API/Java/RAM questions:
There are a few possible solutions which
will have different tradeoffs. We could have a simple solution like we
do with heap size usage - allow the user to control it with a JVM flag:
-Djava.util.concurrent.ForkJoinPool.common.parallelism=X
I genuinely do not have not a good handle
on the MAT ecosystem and need to be guided by community feedback in terms
of boundaries to sensible solutions, especially in the heap usage area.
Would controlling parallelism with a JVM flag be sufficient?
Our use case is typically the exact opposite
in that the more resources we can use, and faster, the better, since we
can release those resources back to the pool anyway.
For background on why I created a single
PR: I wanted to make it easier to understand the overall impact and justify
the value of the changes as a group, since many of the individual changes
don't have a significant impact on their own. If this is no longer useful,
we should just throw it away and start on a fresh branch/es.
Thanks
Jason
On Fri, Jul 19, 2019 at 9:59 AM Andrew
Johnson <andrew_johnson@xxxxxxxxxx>
wrote:
Now we have released
Memory Analyzer 1.9 it seems a good time to look at parallel parsing again.
I've applied the patches locally and then tried fixing the API compatibility
errors.
API baseline is set up using Preferences > Plug-in Development >
API Baselines and I used a Memory Analyzer 1.9 installation as the baseline.
https://wiki.eclipse.org/MemoryAnalyzer/Contributor_Reference#Configure_API_Tooling_Baseline
We have to fix the breaking changes.
IntIndexCollector
LongIndexCollector
move these back to org.eclipse.mat.parser.index.IndexWriter (or have a
stub and change the new classes to package-private).
Also add back 'extends IntIndex<ArrayIntCompressed>' and 'extends
LongIndex', and possibly merge properly and use the fields in the superclasses.
The type org.eclipse.mat.parser.io.PositionInputStream has been changed
from a class to an interface
Put back PositionInputStream and add a new interface.
The other errors are:
Missing @since tag on org.eclipse.mat.parser.io.BufferingRafPositionInputStream
Missing @since tag on org.eclipse.mat.parser.io.ByteArrayPositionInputStream
Missing @since tag on removeInstanceBulk(int, long)
Missing @since tag on org.eclipse.mat.parser.io.DefaultPositionInputStream
The major version should be incremented in version 1.9.0, since API breakage
occurred since version 1.9.0
We could fix these by increasing the version to 1.10.The possible problem
with this is that we need to update the version early in the release cycle,
and so head/master isn't suitable as a service stream. We could solve this
using branches but I would like to keep things simple as I don't have much
time to spend on the project.
The other approach is to avoid all API changes. We move the PositionInputStream
classes into o.e.mat.hprof which not an API project, so is easy to add
to. We replace uses of removeInstanceBulk with a loop over removeInstance,
and check it isn't a big performance problem. We then have the parallel
parsing changes without any API changes. Later we can consider whether
these classes are mature enough and useful enough to move to o.e.mat.parser
and promote them to being APIs.
I have not applied multiple Gerrit patches before. I am not even sure of
the right way of testing; I just merged the branches in order into my master.
What is the best way - to modify existing patches with changes, which is
more logical, but complicated (and I would need help), or to apply the
existing changes, then create some more patches on top?
Any comments?
Andrew Johnson
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6
3AU
_______________________________________________
mat-dev mailing list
mat-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/mat-dev