Bug 181488 - [index] Lots of unbuffered sequential reads in DiskIndex
Summary: [index] Lots of unbuffered sequential reads in DiskIndex
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.3   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: 3.4 M1   Edit
Assignee: Frederic Fusier CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2007-04-07 05:55 EDT by Brock Janiczak CLA
Modified: 2007-08-03 09:44 EDT (History)
1 user (show)

See Also:


Attachments
Proposed patch (3.30 KB, patch)
2007-06-14 09:28 EDT, Frederic Fusier CLA
no flags Details | Diff
New proposed patch (7.06 KB, patch)
2007-07-03 12:43 EDT, Frederic Fusier CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Brock Janiczak CLA 2007-04-07 05:55:17 EDT
Version: 3.3.0
Build id: I20070323-1616

I was monitoring the eclipse startup with the sysinternal's Process Monitor and noticed a bunch of 1 byte reads for jdt index files.  It turns out that they are coming from DiskIndex#readHeaderInfo.  Would it be possible to store the header's size in the index so it can be read into buffer and processed from there?  Even simple operations like readInt cause 4 reads to be executed (although it is likely that the block is cached by the OS).

Most headers seem to be about 150 bytes and it takes about 75 read operations to process it.
Comment 1 Frederic Fusier CLA 2007-04-08 07:04:57 EDT
Sounds a reasonable request. We can apply same optimization in this method than the one made for bug 168384.

However, please note that this method represents an really small part of the Java Tooling Initialization. Using yourkit to look at method counts and time, here are the numbers:

Stack of calls						Time	%	Count
BasicSearchEngine.searchAllTypeNames			4,402		1	
...
  + DiskIndex.initialize(boolean)			1,301	29.6%	105
     + RandomAccessFile.<init>(File, String)		949	21.6%	105
     + DiskIndex.readHeaderInfo(RandomAccessFile)	144	3.3%	100
       + RandomAccessFile.readInt()			54	1.2%	1,421
       + ClassLoader.loadClassInternal(String)		51	1.2%	2
       + RandomAccessFile.readUTF()			7	<0.01%	797
       + HashtableOfIntValues.put(char[], int)		3	<0.01%	1,594
       + SimpleSetOfCharArray.get(char[])		3	<0.01%	797
       + String.toCharArray()				1	<0.01%	797
     + RandomAccessFile.readUTF()			46	1.0%	105
     + RandomAccessFile.close()				2	0 %	105
     + File.exists()					1	0 %	105
Comment 2 Brock Janiczak CLA 2007-04-08 18:44:33 EDT
I wasn't expecting much of an improvement.  It may be more noticeable when you have more indexes.  I have a fairly small workspace and have about 160.  My workspace at work is about 3x the size (400 indexes).

Did you use sampling or tracing to get your profile?  The almost 1s for creating RandomAccessFiles looks a bit odd.  I wrote a simple program to create a new RandomAccessFile on each file in my index and then close it.  Total time was about 7ms (this was not under a profiler).  Although it may be hitting a FS cache or something.

public class Main {
	public static void main(String[] args) {
		File f = new File("D:\\data\\eclipse\\workspace\\.metadata\\.plugins\\org.eclipse.jdt.core");
		File[] children = f.listFiles();
		
		long start = System.nanoTime();
		for (int i = 0; i < children.length; i++) {
			try {
			RandomAccessFile a = new RandomAccessFile(children[i], "r");
			a.close();
			} catch (Exception e) {
				
			}
		}
	}
}
Comment 3 Frederic Fusier CLA 2007-05-11 05:28:11 EDT
Will continue to work on this improvement in next release...
Comment 4 Frederic Fusier CLA 2007-06-14 09:28:06 EDT
Created attachment 71307 [details]
Proposed patch 

This patch replace the RandomAccessFile usage by a FileInputStream and so can take advantage of the optimization done in DiskIndex while fixing bug 171653...
Comment 5 Frederic Fusier CLA 2007-06-15 09:16:52 EDT
I forgot to precise that posted patch passes all JDT/Core and JDT/UI tests + FullSourceWorkspaceCompleteSearchTests + several massive requests in a 3.3 and Europa full binaries/source workspaces...
Comment 6 Frederic Fusier CLA 2007-06-26 13:29:25 EDT
Released for 3.4M1 in HEAD stream.
Comment 7 Jerome Lanneluc CLA 2007-07-02 05:38:17 EDT
Reopening since this change caused bug 195091.
Patch as been reverted in HEAD.
Comment 8 Frederic Fusier CLA 2007-07-02 09:39:07 EDT
Problem came from the fact that DiskIndex internal chars buffer index was not correctly set at the beginning. Reading of index file signature was invalid and so always generate the excpetion of wrong format hence the indexing job at each startup... :-(
Comment 9 Frederic Fusier CLA 2007-07-03 12:43:15 EDT
Created attachment 72969 [details]
New proposed patch
Comment 10 Frederic Fusier CLA 2007-07-03 12:43:40 EDT
Released for 3.4M1 in HEAD stream.
Comment 11 Frederic Fusier CLA 2007-08-03 09:44:58 EDT
Verified (based on code) for 3.4M1 using build I20070802-0800.