Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] ​Is it necessary to synchronize IntIndexCollector.get/set during Pass2Parser?

Hi Jason and Andrew,

> To confirm, are you referring to this line?
Yes, this is exactly what I refer to.

According to profiling data for wall time and CPU time, there is high contention in IntIndexCollection.set and get, which takes up a significant amount of time.

Experiment#1
We can break down the problem into two parts:
- `synchronized(){}` between IntIndexCollection.set and IntIndexCollection.get: there is no time overlap between set and get, so we don't need to synchronize get.
- synchronization within IntIndexCollection.set: set is called concurrently, and as aforementioned, even if two indices are different, the final byte[] range may overlap, making it difficult to remove the lock.
Given that IntIndexCollection.set is highly contended, it is without a doubt that the lock will significantly impact performance. We need to find another way to handle this issue.

Experiment#2
Then I replace ArrayIntCompressed with an uncompressed int[] version and remove all locks within IntIndexCollection.set and get methods. When testing the parsing of 40GB heap, I achieved a 32.5% performance improvement stably. But it seems that IntIndexCollectorUncompressed mentioned by Andrew is what I want

--original
Time elapsed: 1,898.853 s
Time elapsed: 1,868.729 s
Time elapsed: 1,845.501 s
--remove_locks_and_use_int[]_in_ArrayIntCompressed
Time elapsed: 1,280.707 s
Time elapsed: 1,282.467 s
--use IntIndexCollectorUncompressed
Time elapsed: 1,318.243 s

Thanks,
Yi Yang

------------------------------------------------------------------
From:Jason Koch <jkoch@xxxxxxxxxxx>
Send Time:2023 May 31 (Wed.) 03:16
To:"YANG, Yi" <qingfeng.yy@xxxxxxxxxxxxxxx>; Memory Analyzer Dev list <mat-dev@xxxxxxxxxxx>
Subject:Re: [mat-dev] ​Is it necessary to synchronize IntIndexCollector.get/set during Pass2Parser?

Hi Yi Yang,

To confirm, are you referring to this line?

If so - the reason is because the underlying array may in some circumstance be an encoded array where the values are not complete int[] arrays, ie: an element may tear across multiple fields, so a concurrent update even to two different indices might change the values. It should be possible to swap to a CAS implementation, or if you have alternative suggestions for a different approach.

FWIW there are some other PRs that could have a big impact too - such as this - https://bugs.eclipse.org/bugs/show_bug.cgi?id=570670, if you have some time to help review it would be great to have them merged.

Thanks
Jason

On Sun, May 28, 2023 at 7:24 PM Yi Yang via mat-dev <mat-dev@xxxxxxxxxxx> wrote:
Hi all, when building hprof index, MAT first creates Pass2Parser and concurrently parses objects and adds them to ArrayIntCollector. Then, in the subsequent fillIn routine, it reads objects from ArrayIntCollector in order and writes them to index file. The entire process is shown below:
HprofIndexBuilder
  - Pass2Parser.read
    - Pass2Parser.readSegment
      - addObject to IntIndexCollector in parallel by calling IntIndexCollector.set(key, value)
  - fillIn
  - write IntIndexCollector to file by calling IntIndexCollector.get(key)

It seems that there is no time overlap between IntIndexCollector.get and IntIndexCollector.set? Additionally, when calling ArrayIntCollector.set(key, value) in addObject, the key is the object's ID and is unique. So, is it really necessary to synchronize get/set under synchronized protection? I observed that addObject has considerable performance overhead, and removing this lock protection results in a 19.3% performance improvement.


Best regards
Yi Yang
_______________________________________________
mat-dev mailing list
mat-dev@xxxxxxxxxxx
To unsubscribe from this list, visit https://www.eclipse.org/mailman/listinfo/mat-dev

Back to the top