Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] Parallel parsing

I'm happy to field any questions.

Full disclosure -

My focus has been on pass1/pass2 early parsing stages, because that's what my test dumps were primarily filled with.

Retesting on some more varied heaps shows what is probably not a surprise - pass1/pass2 parsing are greatly improved, but for heaps where other steps in the process dominate duration will not get 2x speedup! This does not invalidate the pass1/pass2 parsing but does mean not all heaps will get 2x.

I will continue work in free time on looking at some of these other areas, my hope is that I can get similar improvements throughout the entire process.

On Tue, Mar 5, 2019 at 3:10 AM Andrew Johnson <andrew_johnson@xxxxxxxxxx> wrote:
Jason Koch has kindly contributed a set of changes to allow parallel parsing of HPROF files.

His comments are:

>I've just pushed a round of changes that should deliver double the single threaded performance, focused on the initial parsing stage. I believe the benefit/change is large enough for your consideration now. The Pass1 and Pass2Parser are much more than double the performance, however the limitations of Amdahl's Law is right here - I'll need to attack some of the contended areas and move onto other single-threaded areas to continue to get improvements past 2x (which should be very possible).
>
>https://bugs.eclipse.org/bugs/show_bug.cgi?id=277422
>
>I understand this is quite a large number-of-lines change, so I have tried to split this up into a smaller set of commits which _should_ be reviewable independently as long as they are progressed in order. The main themes are: 1) a new RandomAccessFile based I/O parsing layer, 2) refactor of parser-handler and parser logic so that tuning opportunities and concurrency can be opened up, 3) improvements to index collection and writing layers for batching and compression offload. There are also point areas such as GarbageCleaner which are mostly independent.
>
>Thanks
>Jason

I estimate there are about 3000 lines of changes (though some are lines just moved to new files), so probably needs to go through the large contribution process.
https://www.eclipse.org/projects/handbook/#ip-cq
My reading of the process is that a committer will need to attach the changes as patches to a contribution questionnaire (CQ), so we can't merge the changes just yet.

https://wiki.eclipse.org/Development_Resources/Handling_Git_Contributions#Gerrit
I think it makes sense to review the changes before CQ submission/approval so then we could merge without more changes, so I would encourage people to take a look, and committers can vote on the changes. Jason has signed the ECA but at this stage I think it's easier if just Jason makes any additional changes as then there is just one author for the CQ process. Once merged then we can all make further contributions.

Some points:
  1. The changes require Java 8. This shouldn't be a problem as stand-alone MAT already requires Java 8 for the RCP components, and Java 7 isn't supported by Oracle and only for a short while longer from IBM. https://developer.ibm.com/javasdk/support/lifecycle/
  2. The code style of added code follows the MAT style, so that's good.
  3. We need to consider API compatibility. Moving the index builders might be a problem. https://git.eclipse.org/r/#/c/138024/ The default interface methods in Java 8 might be helpful if methods are added to interfaces. An API baseline would help: https://help.eclipse.org/2018-12/index.jsp?topic=%2Forg.eclipse.pde.doc.user%2Ftasks%2Fapi_tooling_baseline.htm which would require pulling down changes for review: https://wiki.eclipse.org/Platform-releng/Git_Workflows#Pulling_a_change_down_from_Gerrit_for_review
  4. We should particularly review additional APIs added in the parser plugin, as APIs are hard to change later.
  5. We need to check that serialization is compatible for changed serializable classes.

This is an exciting contribution, please take a look.

Andrew Johnson






Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number 741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU
_______________________________________________
mat-dev mailing list
mat-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://www.eclipse.org/mailman/listinfo/mat-dev

Back to the top