Community
Participate
Working Groups
In Bug 423715, support for multiple checksum algorithms was added. This introduced a major performance impact due to inefficient handling of the InputStream: https://git.eclipse.org/r/#/c/69560/32/bundles/org.eclipse.equinox.p2.repository/src/org/eclipse/equinox/internal/p2/repository/helpers/ChecksumProducer.java The old code used while ((read = fis.read(buffer, 0, bufferSize)) != -1) { to read in 4k chunks, while the new code uses while (fis.read(buffer) != -1) { which reads byte by byte. This is extremely inefficient.
New Gerrit change created: https://git.eclipse.org/r/138216
Do you have any numbers showing how much your patch improves the situation?
Sorry, I actually didn't read the code correctly. I though it called read(), not not read(buffer). So all is good here. Looks like the reason is just that SHA-256 is way slower than MD5 :/
Can the reason be that both checksums are calculated? I would be fine with skipping md5 calculation if sha is there . WDYT? Ready to try it out?
Yes, creating both checksums causes the file to be read twice, so that's already a factor of two. We could introduce a custom MultiDigestInputStream that is able to update the digest for multiple algorithms in one go while only reading the file once. Is disabling of MD5 just like that reasonable?
(In reply to Sebastian Ratz from comment #5) > Yes, creating both checksums causes the file to be read twice, so that's > already a factor of two. > > We could introduce a custom MultiDigestInputStream that is able to update > the digest for multiple algorithms in one go while only reading the file > once. > > Is disabling of MD5 just like that reasonable? It's not disabling it like that it's when the stronger checksum (sha256) is checked, don't bother with md5. If there is no sha256 - md5 should still be checked.
(In reply to Sebastian Ratz from comment #5) > Yes, creating both checksums causes the file to be read twice, so that's > already a factor of two. If this logic is correct, p2 would read the file not twice but 4+ times. Check o.e.e.internal.provisional.p2.artifact.repository.processing.ProcessingStep - p2 reads artifact bytes only once and applyes a multiple ProcessingSteps by chaining them.
This is about the production of the checksum. See org.eclipse.equinox.internal.p2.artifact.processors.checksum.ChecksumUtilities.calculateChecksums(File, Map<String, String>, Collection<String>) It calls ChecksumProducer.produce(pathOnDisk, algorithm) for every algorithm which each time instantiates a new InputStream.
(In reply to Mykola Nikishov from comment #7) Please disregard, I constantly mixing checksum generation/verification ;-)
org.eclipse.equinox.internal.p2.repository.helpers.ChecksumProducer.produce(File, String) should use ProcessingSteps approach and not read file every single time for each MessageDigest supported.
I think we really need some A/B testing here to sort out 1. what's the actual performance drop factor (is it a constant factor, a N-power-factor...), and see whether it's an algorithmic issue or just some useless steps to remove 2. How many times we invoke the ChecksumProducer.produce(...) methods and whether they do match expectation 3. Whether it's specifically the SHA-256 that's taking long, and if so, why? In bug 545159, you mention 30+ minutes for 300 bundles IIRC. But here locally with `sha256sum` command, I can generate the sha-256 checksum of the 685 plugins of a local Eclipse instance in less than a second. I would expect similar order for SHA-256 computed with Java. If the SHA-256 computation itself is the bottleneck, then it narrows down where to improve things.
Dropping milestone until someone agrees to progress this issue for a release.
So I think this issue boils down to: The computation could be made (much?) faster if the file were read only once. That doesn't sound critical to me, so I'll reduce the severity.
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're closing this bug. If you have further information on the current state of the bug, please add it and reopen this bug. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. -- The automated Eclipse Genie.