Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [mat-dev] A suggestion: would you benefit from the 'jzran' library for random-access gzip archives?

Hello Eugene,

Thanks for the idea. I find it interesting; I think it was a good decision to send something to the list, even without a patch :) 

I can give some technical details about the HPRPF parser. MAT is also working with IBM system dumps (which are zipped); probably Andrew (who did the IBM dumps parser) could give some info if such an approach could work there.

When parsing an HPROF file we go twice sequentially over the whole file. In this process we create several index files. Afterwards we use the original .hprof file only to read the complete information about single objects, and then it is random access. However, as you noticed we read bigger pieces of data, cache some of them and "hope" that some of the other objects that the user will request are in the same blocks. Honestly, I haven't got empirical data how often this cache is hit.

What we read much more often are actually the index files. These are currently not zipped, although the information inside is compressed to some extent. These are all write-once files. Access afterwards is again random, but similar to above there is some data cached. Having them zipped would save some space on disk, but I am not sure what would be the effect on performance. I guess it would be difficult to tell this before actually measuring it with different-sized heap dumps and some of the commonly usage features of the tool.

I think trying to use the library for the .hprof file only shouldn't be so difficult, whereas using it for the indices would be more a complex change. I'd like to give it a try, but I can't promise I'll find time for it very soon. A patch would be nice ;-) 

About the license - I have to check what licenses are allowed by Eclipse. I just don't know this by heart. I also think BSD is ok, but I have to double-check it.

Regards,
Krum

-----Original Message-----
From: mat-dev-bounces@xxxxxxxxxxx [mailto:mat-dev-bounces@xxxxxxxxxxx] On Behalf Of Eugene Kirpichov
Sent: Dienstag, 5. April 2011 14:20
To: mat-dev@xxxxxxxxxxx
Subject: [mat-dev] A suggestion: would you benefit from the 'jzran' library for random-access gzip archives?

Hello,

A while ago I wrote a library for random access to gzip archives -
jzran http://code.google.com/p/jzran . I originally wrote it for the
logophagus project http://code.google.com/p/logophagus , but hoped to
find other uses for it.

The library is BSD-licensed, so basically free for any kind of usage.

I wonder if the Eclipse Memory Analyzer would benefit from it? I think
it could be cool to open/analyze gzipped .hprof files without
decompressing them (it's quite a frequent situation e.g. in Yandex
among my ex-colleagues - you gzip a profile on a remote server, copy
it to your machine, decompress and study it with yjp). Perhaps in some
cases it could even be faster then opening uncompressed ones. Or maybe
you could store some of your indices in compressed form and use jzran
to read them.

The answer of course depends very much on the access pattern - this
library is basically designed for relatively long reads from arbitrary
positions in the file; in a "random read" scenario it would be slower.

I looked at HprofRandomAccessParser and
BufferedRandomAccessInputStream - looks like in combination with the
latter, jzran could do the trick - assuming that even many "seemingly
random" reads are actually sequential (e.g. many objects in an object
array actually allocated one after another). (this is the first time
I'm looking at the codebase, so I may be wrong)

Yeah, I know, I should just write a patch myself :) but while I'm
being guilty not doing this, I thought that writing to the mailing
list would still be better than doing nothing at all.
Please tell me what you think.

--
Eugene Kirpichov
Principal Engineer,
Mirantis Inc. http://www.mirantis.com/
_______________________________________________
mat-dev mailing list
mat-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/mat-dev


Back to the top