Bug 150516 - BufferedRandomInputStream doesn't buffer correctly
Summary: BufferedRandomInputStream doesn't buffer correctly
Status: RESOLVED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Compendium (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows 2000
: P3 normal (vote)
Target Milestone: 3.2.1   Edit
Assignee: Oleg Besedin CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2006-07-13 10:57 EDT by John Arthorne CLA
Modified: 2006-08-14 11:08 EDT (History)
3 users (show)

See Also:


Attachments
Suggested fix (2.05 KB, patch)
2006-07-13 11:12 EDT, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2006-07-13 10:57:20 EDT
Run the following benchmark on a file with BufferedRandomInputStream:

BufferedRandomInputStream in = new BufferedRandomInputStream(file);
byte[] bytes = new byte[4];
while (in.read(bytes) >= 0) {}

Instrumenting file system access (either with FileMon or instrumenting the read calls in BuferredRandomInputStream), we see:

Read: 4
Read: 2048
Read: 4
Read: 2048
Read: 4
Read: 2048
Read: 4
... etc ...

In other words, it is only correctly reading a full buffer on every second read. This test case simulates the behaviour when BufferedRandomInputStream is wrapped in a DataInputStream, and readInt() is called, which is what happens when reading the extension registry. On startup of a large Eclipse application, this causes lots of reads while the "extradata" is loaded from the plugin registry. 

The problem is in the read(byte[], int, int) method.  If the buffer is exhausted, it just reads the requested bytes from the file, then fills the buffer and returns. The "read: 4" is from reading the requested bytes, and the "read: 2048" is from filling the buffer.
Comment 1 John Arthorne CLA 2006-07-13 11:12:04 EDT
Created attachment 46245 [details]
Suggested fix
Comment 2 John Arthorne CLA 2006-07-13 11:18:54 EDT
The above patch fixes the read method, and fixes two other compiler warnings in the file - an unused throws clause and an unnecessary else clause. These last two changes are cosmetic.
Comment 3 Oleg Besedin CLA 2006-07-14 15:07:50 EDT
Thanks! It looks good; patch released to the CVS Head.
Comment 4 Pascal Rapicault CLA 2006-07-14 15:38:46 EDT
Should we also make that available in 3.2.1?
John, Oleg, Jeff?
Comment 5 Oleg Besedin CLA 2006-07-14 16:25:28 EDT
On adding this patch into 3.2.1: 

I guess it might depend on specific system, but in practice this change makes no measurable difference in performance.

(I tried similar thing during the work on startup performance and just re-tested it again - however, mind it that it is on specific WinXP computer; it might be different on a different OS or hardware.)

From the memory, the pattern in which registry information is read makes this situation rather rare. Often than this situation is encountered, more than one actual read is required as the expected data size exceeded the 2K buffer size. 

(By the way, increasing the buffer size beyond 2K doesn't seem to provide any measurable benefit in tests.)

This is a good patch and it improves theoretical performance of the BufferedRandomInputStream, but, so far, there is no compelling practical reason to put it into 3.2.1.
Comment 6 John Arthorne CLA 2006-07-14 16:37:42 EDT
I didn't measure the performance effect, and I agree it is only worth putting in 3.2.1 if it provides a measurable benefit.  However, I did see this 2048/4/2048/4/... pattern in large numbers while profiling startup of a large eclipse-based application with filemon, which is what led me to filing the bug in the first place.  

It will likely have more impact when the number of plugins is much larger and there is more registry churn on startup (not a minimal startup, but a more typical startup with many views and editors already open).  It will also depend on the VM, since the extension data is held using soft references, and different VMs have various policies for flushing soft references.
Comment 7 Oleg Besedin CLA 2006-07-14 17:08:25 EDT
I think the reason this doesn't have much impact on the "real" picture is that registry information is spread out (hence, random access) and, is located either in (1) relatively small blocks (much smaller than 2K), or (2) large blocks (typically, 3-4K). (And only small portion of the total registry cache file(s) is ever read.)

In the first case there is only one physical real done (the whole block and some extra is read at the very beginning) and the situation you describe doesn't occur. 

In the second case (which is much more rare) "n+some" (up to 2*n) physical read is done by the "old" code and "n" buffer reads are done by the "new" code. Most often, it means 2 reads instead of 3 reads. I don't remember now the ratio or how often this occurs, but it is relatively infrequent scenario.

The "practical" effect would depend not so much on the size of the application, but on the size of the configuration elements. If we get an application with large number of unusually large config elements, the ratio of second scenario would increase.

However, the picture:
Read: 4
Read: 2048
Read: 4
Read: 2048
Read: 4
Read: 2048
Read: 4
would be rather non-typical for the actual registry reads.
Comment 8 John Arthorne CLA 2006-07-14 17:23:38 EDT
Prior to the patch, the very first call to read(byte[] b, int off, int len) would cause two disk reads - once to read "len" bytes and once to fill the buffer at the end. So, it could be that the 4/2048/4/2048 pattern I observed with filemon were actually pairs of separate registry accesses.
Comment 9 Oleg Besedin CLA 2006-07-17 14:41:04 EDT
Recoded bug 150852 for another potential improvement of the BufferedRandomInputStream.
Comment 10 Jeff McAffer CLA 2006-07-20 23:27:49 EDT
so is it faster?
Comment 11 Oleg Besedin CLA 2006-07-21 09:44:19 EDT
Performance: there is no measurable effect on either current or the "new and improved" JUnit startup tests.

However, as John pointed out, in those tests we measure "warm" startup on SDK. The "cold" startup of a large Eclipse-based application might show a different picture.

At present we are waiting for "cold" tests results with a larger application.
Comment 12 Oleg Besedin CLA 2006-07-25 13:46:45 EDT
The test results showed 8% improvement for the "cold" scenario using large application.

Re-opening the bug to consider releasing the patch into 3.2.1 stream.
Comment 13 Jeff McAffer CLA 2006-08-10 11:28:46 EDT
+1
Comment 14 DJ Houghton CLA 2006-08-14 11:08:15 EDT
Released to 3.2.x maintenance branch.