Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] ​Is it necessary to synchronize IntIndexCollector.get/set during Pass2Parser?
  • From: Andrew Johnson <andrew_johnson@xxxxxxxxxx>
  • Date: Wed, 31 May 2023 20:47:14 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=uk.ibm.com; dmarc=pass action=none header.from=uk.ibm.com; dkim=pass header.d=uk.ibm.com; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eTLVWYP1ZW1fYw7p3MAskey6nrXdjwPBvr+o45HZozs=; b=TTOT3k9EumEA+IHodhI2yUW+kulmqLz/M+0x0bDOKzMpNMNQ2a7SV9pQ5BHzRsU4wHm34kuPNVwjcTGtnwYkm1AOWAUf6fQpuhSnbfDcH9SAwSUFfTgjclpMGMZe0LXYSFYq7WmIrKp4ZK9L+14dMTWLjcna5X+B4mQmSUc+ApK/vteNamj1OfByPVzx4wk4cGg5uVFhlNbwAdBKZpXaRivfI3rT2JP2vsO1NDzQHK7W+V/p6oTFqnWxmvZ1qXMarWaJIjtxt4nnHNqdDan4hGFfjj2G5wkFR6Pc7ciIcFxKVPREdcZcEMf31CWBjfkZACoJt5vh67W4g9G996CMnQ==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=allXLXe+OWPu17e0mNyheogg2Dfff9xKpcNQB3JOSni6O/g0Od2MEjf+xLIO+7RuLKGdSdAByRC5TpkkKdYtnncLZHgPjDgPl/1Ut8UWkYGR0/vtm9CnhLj7rzDQw1MFXZfiBpeRyLcwOXd/Nu1apyXBxAWEAlVkl1yFSJiZUoz4w40Wl983wyjND06Mt3p7JC5kV+hWJ9JYKkkIOXjf/yNVv/O8RnKm1L/vAYXZ030KWnrYUPCEHaYw2EUtjQaFIhcG+OS+xXmPWnprkAJBj/gJtQ2yMYwokN2eJ/vyiNBWrR5NGngFoVbUiXiqn0Z2LGTt5cb0K4B0VaQr6HDZRQ==
  • Delivered-to: mat-dev@xxxxxxxxxxx
  • List-archive: <https://www.eclipse.org/mailman/private/mat-dev/>
  • List-help: <mailto:mat-dev-request@eclipse.org?subject=help>
  • List-subscribe: <https://www.eclipse.org/mailman/listinfo/mat-dev>, <mailto:mat-dev-request@eclipse.org?subject=subscribe>
  • List-unsubscribe: <https://www.eclipse.org/mailman/options/mat-dev>, <mailto:mat-dev-request@eclipse.org?subject=unsubscribe>
  • Thread-index: AQHZlAEUYXhZlB1rtkmbGPZzarEd8w==
  • Thread-topic: [mat-dev] ​Is it necessary to synchronize IntIndexCollector.get/set during Pass2Parser?

Thank you for your interest in improving the performance of MAT.

 

The parsing code was enhanced to have some parallelism, and some of the index code was enhanced to be thread safe.

The index code in org.eclipse.mat.parser is an MAT API, so we can’t really make incompatible changes as an adopter of MAT might already be using it and presuming some thread safety. We should improve the Javadoc to explain what is safe.

 

I think your example is for code in HprofParserHandlerImpl.java such as:
        // log address

        object2classId.set(index, classIndex);

        object2position.set(index, object.filePosition);

Yes, the index is separate for separate threads, but IntIndexCollector then chooses a page based on the index giving an ArrayIntCompressed object in which to set the value. This internally has a byte array but the value can be spread across multiple bytes as the class index is known to have leading zeroes so only certain bits need to be stored. This means two threads could access the same byte in the byte array. So, synchronization is needed. It might be possible to use AtomicIntegerArray instead of a byte[] but that could be slower. Is there contention on the lock? The page size is 1000000 so if the number of objects in the heap is smaller than this then they will all end up on the same ArrayIntCompressed. If the number is much larger then there might still be a lot of contention as the objects in the HPROF are often in address order so the threads might be processing objects with a similar object index, so even reducing the page size might not help. Is using IntIndexCollectorUncompressed any faster (at the cost of more memory)? There may be similar problems with the object2position index.

 

For more performance work we would like some reproducible tests – there are some performance tests in org.eclipse.mat.tests but we don’t run them.

 

Regards,

 

Andrew Johnson

 

 

 

From: mat-dev <mat-dev-bounces@xxxxxxxxxxx> On Behalf Of Yi Yang via mat-dev
Sent: Monday, May 29, 2023 3:24 AM

Hi all, when building hprof index, MAT first creates Pass2Parser and concurrently parses objects and adds them to ArrayIntCollector. Then, in the subsequent fillIn routine, it reads objects from ArrayIntCollector in order and writes them to index file. The entire process is shown below:

HprofIndexBuilder

  - Pass2Parser.read

    - Pass2Parser.readSegment

      - addObject to IntIndexCollector in parallel by calling IntIndexCollector.set(key, value)

  - fillIn

  - write IntIndexCollector to file by calling IntIndexCollector.get(key)

 

It seems that there is no time overlap between IntIndexCollector.get and IntIndexCollector.set? Additionally, when calling ArrayIntCollector.set(key, value) in addObject, the key is the object's ID and is unique. So, is it really necessary to synchronize get/set under synchronized protection? I observed that addObject has considerable performance overhead, and removing this lock protection results in a 19.3% performance improvement.

 

 

Best regards

Yi Yang

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Back to the top