Bug 492941 - Add error detection to new index
Summary: Add error detection to new index
Status: RESOLVED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 4.6   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Stefan Xenos CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 481796
  Show dependency tree
 
Reported: 2016-05-03 23:11 EDT by Stefan Xenos CLA
Modified: 2016-08-27 03:15 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Xenos CLA 2016-05-03 23:11:17 EDT
Error detection was always a planned feature for the new index. The index should contain CRCs which are capable of detecting disk corruption and aborted writes.
Comment 1 Stefan Xenos CLA 2016-05-10 22:03:15 EDT
I implemented the CRC approach fully, but it didn't work out as expected. TL;DR my fastest implementation of CRCs was 4% slower than using a lock flag, which is still too slow IMO. On this basis, I plan to deviate from the original design and use lock flags instead. This will protect the index from aborted writes (where eclipse crashes in the middle of a write) but not from disk corruption.

If we're still getting reports of index corruption with this approach, we can re-investigate the CRC approach. I'll push my implementation to github so we can recover it if we ever need it.

I tried a number of hashing functions, block sizes, CRC bit sizes, and both hierarchal and flat CRC tables. I compared these approaches with the one used by CDT (a lock flag that is written at the start of writes and cleared at the end). In every case, the CRC function was a significant bottleneck.

Here's the raw data (measured the average indexing time in a workspace with 54,000 classes):

No error correction, 4k block size (4 runs):
1.02857ms +/- 0.08153ms

StreamHasher, 8k block size, 4 byte crc (1 test run):
3.28166ms

StreamHasher, 8k block size, 2 byte crc (1 test run):
3.083238ms

adler32, 8k block size, 2 byte crc (1 test run):
1.485546ms

simple checksum, 8k block size, 2 byte crc (1 test run):
1.205637ms

simple checksum, 4k block size, 2 byte crc (5 test runs):
1.06787 +/- 0.05364
Comment 2 Stefan Xenos CLA 2016-05-10 22:26:07 EDT
I've pushed the CRC implementations to the "error_correction" branch. The one that's there is the (4k/2byte/simple checksum) variant that performed best.

The main "newindex" branch is using the lock flag approach.