Re: [mat-dev] Parallel parsing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [mat-dev] Parallel parsing

From: Jason Koch <jkoch@xxxxxxxxxxx>
Date: Mon, 26 Aug 2019 10:57:01 -0700
Delivered-to: mat-dev@xxxxxxxxxxx
List-archive: <https://www.eclipse.org/mailman/private/mat-dev>
List-help: <mailto:mat-dev-request@eclipse.org?subject=help>
List-subscribe: <https://www.eclipse.org/mailman/listinfo/mat-dev>, <mailto:mat-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://www.eclipse.org/mailman/options/mat-dev>, <mailto:mat-dev-request@eclipse.org?subject=unsubscribe>

Hi Andrew

I propose the following:

- Decide on clear goals in terms of API compatibilty, Java version compatibility, and CPU/RAM tradeoff goals.

- Break down each of the patches and bring forward those that are eligible one by one, and ~in change order. I did put break the branch into each logical group of changes per commit and push that. This should be able to form the basis of a smaller set of changes.

I'd be very happy to get started on this, assuming we can clear up the API/Java/RAM questions:

There are a few possible solutions which will have different tradeoffs. We could have a simple solution like we do with heap size usage - allow the user to control it with a JVM flag:

     -Djava.util.concurrent.ForkJoinPool.common.parallelism=X

I genuinely do not have not a good handle on the MAT ecosystem and need to be guided by community feedback in terms of boundaries to sensible solutions, especially in the heap usage area. Would controlling parallelism with a JVM flag be sufficient?

Our use case is typically the exact opposite in that the more resources we can use, and faster, the better, since we can release those resources back to the pool anyway.

For background on why I created a single PR: I wanted to make it easier to understand the overall impact and justify the value of the changes as a group, since many of the individual changes don't have a significant impact on their own. If this is no longer useful, we should just throw it away and start on a fresh branch/es.

Thanks

Jason

On Fri, Jul 19, 2019 at 9:59 AM Andrew Johnson <andrew_johnson@xxxxxxxxxx> wrote:

Now we have released Memory Analyzer 1.9 it seems a good time to look at parallel parsing again.

I've applied the patches locally and then tried fixing the API compatibility errors.
API baseline is set up using Preferences > Plug-in Development > API Baselines and I used a Memory Analyzer 1.9 installation as the baseline.
https://wiki.eclipse.org/MemoryAnalyzer/Contributor_Reference#Configure_API_Tooling_Baseline

We have to fix the breaking changes.

IntIndexCollector
LongIndexCollector
move these back to org.eclipse.mat.parser.index.IndexWriter (or have a stub and change the new classes to package-private).
Also add back 'extends IntIndex<ArrayIntCompressed>' and 'extends LongIndex', and possibly merge properly and use the fields in the superclasses.

The type org.eclipse.mat.parser.io.PositionInputStream has been changed from a class to an interface

Put back PositionInputStream and add a new interface.

The other errors are:
Missing @since tag on org.eclipse.mat.parser.io.BufferingRafPositionInputStream
Missing @since tag on org.eclipse.mat.parser.io.ByteArrayPositionInputStream
Missing @since tag on removeInstanceBulk(int, long)
Missing @since tag on org.eclipse.mat.parser.io.DefaultPositionInputStream
The major version should be incremented in version 1.9.0, since API breakage occurred since version 1.9.0

We could fix these by increasing the version to 1.10.The possible problem with this is that we need to update the version early in the release cycle, and so head/master isn't suitable as a service stream. We could solve this using branches but I would like to keep things simple as I don't have much time to spend on the project.

The other approach is to avoid all API changes. We move the PositionInputStream classes into o.e.mat.hprof which not an API project, so is easy to add to. We replace uses of removeInstanceBulk with a loop over removeInstance, and check it isn't a big performance problem. We then have the parallel parsing changes without any API changes. Later we can consider whether these classes are mature enough and useful enough to move to o.e.mat.parser and promote them to being APIs.

I have not applied multiple Gerrit patches before. I am not even sure of the right way of testing; I just merged the branches in order into my master. What is the best way - to modify existing patches with changes, which is more logical, but complicated (and I would need help), or to apply the existing changes, then create some more patches on top?

Any comments?

Andrew Johnson

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
_______________________________________________
mat-dev mailing list
mat-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/mat-dev

Follow-Ups:
- Re: [mat-dev] Parallel parsing
  - From: Andrew Johnson

References:
- [mat-dev] Parallel parsing
  - From: Andrew Johnson

Prev by Date: Re: [mat-dev] New YouTube video on MAT
Next by Date: Re: [mat-dev] means for decode java.lang.Throwable#backtrace value to stacktrace
Previous by thread: [mat-dev] Parallel parsing
Next by thread: Re: [mat-dev] Parallel parsing
Index(es):
- Date
- Thread

Breadcrumbs