492941 – Add error detection to new index

Bug 492941 - Add error detection to new index

Summary: Add error detection to new index

Status:	RESOLVED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	4.6
Hardware:	PC Linux

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	Stefan Xenos
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	481796
	Show dependency tree

Reported:	2016-05-03 23:11 EDT by Stefan Xenos
Modified:	2016-08-27 03:15 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Stefan Xenos

2016-05-03 23:11:17 EDT

Error detection was always a planned feature for the new index. The index should contain CRCs which are capable of detecting disk corruption and aborted writes.

Comment 1 Stefan Xenos

2016-05-10 22:03:15 EDT

I implemented the CRC approach fully, but it didn't work out as expected. TL;DR my fastest implementation of CRCs was 4% slower than using a lock flag, which is still too slow IMO. On this basis, I plan to deviate from the original design and use lock flags instead. This will protect the index from aborted writes (where eclipse crashes in the middle of a write) but not from disk corruption.

If we're still getting reports of index corruption with this approach, we can re-investigate the CRC approach. I'll push my implementation to github so we can recover it if we ever need it.

I tried a number of hashing functions, block sizes, CRC bit sizes, and both hierarchal and flat CRC tables. I compared these approaches with the one used by CDT (a lock flag that is written at the start of writes and cleared at the end). In every case, the CRC function was a significant bottleneck.

Here's the raw data (measured the average indexing time in a workspace with 54,000 classes):

No error correction, 4k block size (4 runs):
1.02857ms +/- 0.08153ms

StreamHasher, 8k block size, 4 byte crc (1 test run):
3.28166ms

StreamHasher, 8k block size, 2 byte crc (1 test run):
3.083238ms

adler32, 8k block size, 2 byte crc (1 test run):
1.485546ms

simple checksum, 8k block size, 2 byte crc (1 test run):
1.205637ms

simple checksum, 4k block size, 2 byte crc (5 test runs):
1.06787 +/- 0.05364

Comment 2 Stefan Xenos

2016-05-10 22:26:07 EDT

I've pushed the CRC implementations to the "error_correction" branch. The one that's there is the (4k/2byte/simple checksum) variant that performed best.

The main "newindex" branch is using the lock flag approach.