Bug 573503 - GZIP HPROF performance improvement
Summary: GZIP HPROF performance improvement
Status: RESOLVED FIXED
Alias: None
Product: MAT
Classification: Tools
Component: Core (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 1.12.0   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2021-05-12 10:02 EDT by Andrew Johnson CLA
Modified: 2021-05-28 01:58 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andrew Johnson CLA 2021-05-12 10:02:42 EDT
GZIP compressed HPROF files are read with special streams to handle seeking in a non-seekable format.

The chucked reader ChunkedGZIPRandomAccessFile works quite well as the state is reset every 1MB, so not much reading needs to be done. Regular GZIP HPROF files created by users with gzip or by older versoin of MAT use the SeekableStream which has various cached readers, and where it finds the immediately preceding reader, clones it, then reads forward to the required position. The underlying GZIPinputStream2 is quite big as it has an InflaterInputStream with a 32K dictionary, 257 byte output buffer and 16KB input buffer.

We could improve the performance a little by using the dictionary as a cache. It holds the last 32K of output, so could also be used for a mark() / reset() cache. We can also merge the output buffer into the dictionary.
GZIPInputStream2 then needs some code to handle mark and reset so that the CRC calculation is still correct. The best thing is to ignore bytes read after a reset from the mark point back to the farthest reset point, that way the CRC does not need to be cloned and a skip of this region doesn't need to recalculate the CRC.
SeekableStream then needs some small changes to handle mark / reset, so marking regularly and just before the end of a long skip, and doing a reset before putting the PosStream back in the cache.
This is all conditional on markSupported in InflaterInputStream

This can be tested by the Acquire Heap Dumps and Export HPROF tests which can export in chunked and unchunked compressed format.
Comment 1 Eclipse Genie CLA 2021-05-12 11:32:39 EDT
New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/180542
Comment 3 Eclipse Genie CLA 2021-05-14 08:08:26 EDT
New Gerrit change created: https://git.eclipse.org/r/c/mat/org.eclipse.mat/+/180603
Comment 5 Andrew Johnson CLA 2021-05-28 01:58:44 EDT
This is now complete.