181488 – [index] Lots of unbuffered sequential reads in DiskIndex

Bug 181488 - [index] Lots of unbuffered sequential reads in DiskIndex

Summary: [index] Lots of unbuffered sequential reads in DiskIndex

Status:	VERIFIED FIXED

Alias:	None

Product:	JDT
Classification:	Eclipse Project
Component:	Core (show other bugs)
Version:	3.3
Hardware:	All All

Importance:	P3 normal (vote)
Target Milestone:	3.4 M1
Assignee:	Frederic Fusier
QA Contact:

URL:
Whiteboard:
Keywords:	performance

Depends on:
Blocks:

Reported:	2007-04-07 05:55 EDT by Brock Janiczak
Modified:	2007-08-03 09:44 EDT (History)
CC List:	1 user (show)

See Also:

Attachments
Proposed patch (3.30 KB, patch) 2007-06-14 09:28 EDT, Frederic Fusier	no flags	Details \| Diff
New proposed patch (7.06 KB, patch) 2007-07-03 12:43 EDT, Frederic Fusier	no flags	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Brock Janiczak

2007-04-07 05:55:17 EDT

Version: 3.3.0
Build id: I20070323-1616

I was monitoring the eclipse startup with the sysinternal's Process Monitor and noticed a bunch of 1 byte reads for jdt index files.  It turns out that they are coming from DiskIndex#readHeaderInfo.  Would it be possible to store the header's size in the index so it can be read into buffer and processed from there?  Even simple operations like readInt cause 4 reads to be executed (although it is likely that the block is cached by the OS).

Most headers seem to be about 150 bytes and it takes about 75 read operations to process it.

Comment 1 Frederic Fusier

2007-04-08 07:04:57 EDT

Sounds a reasonable request. We can apply same optimization in this method than the one made for bug 168384.

However, please note that this method represents an really small part of the Java Tooling Initialization. Using yourkit to look at method counts and time, here are the numbers:

Stack of calls						Time	%	Count
BasicSearchEngine.searchAllTypeNames			4,402		1	
...
  + DiskIndex.initialize(boolean)			1,301	29.6%	105
     + RandomAccessFile.<init>(File, String)		949	21.6%	105
     + DiskIndex.readHeaderInfo(RandomAccessFile)	144	3.3%	100
       + RandomAccessFile.readInt()			54	1.2%	1,421
       + ClassLoader.loadClassInternal(String)		51	1.2%	2
       + RandomAccessFile.readUTF()			7	<0.01%	797
       + HashtableOfIntValues.put(char[], int)		3	<0.01%	1,594
       + SimpleSetOfCharArray.get(char[])		3	<0.01%	797
       + String.toCharArray()				1	<0.01%	797
     + RandomAccessFile.readUTF()			46	1.0%	105
     + RandomAccessFile.close()				2	0 %	105
     + File.exists()					1	0 %	105

Comment 2 Brock Janiczak

2007-04-08 18:44:33 EDT

I wasn't expecting much of an improvement.  It may be more noticeable when you have more indexes.  I have a fairly small workspace and have about 160.  My workspace at work is about 3x the size (400 indexes).

Did you use sampling or tracing to get your profile?  The almost 1s for creating RandomAccessFiles looks a bit odd.  I wrote a simple program to create a new RandomAccessFile on each file in my index and then close it.  Total time was about 7ms (this was not under a profiler).  Although it may be hitting a FS cache or something.

public class Main {
	public static void main(String[] args) {
		File f = new File("D:\\data\\eclipse\\workspace\\.metadata\\.plugins\\org.eclipse.jdt.core");
		File[] children = f.listFiles();
		
		long start = System.nanoTime();
		for (int i = 0; i < children.length; i++) {
			try {
			RandomAccessFile a = new RandomAccessFile(children[i], "r");
			a.close();
			} catch (Exception e) {
				
			}
		}
	}
}

Comment 3 Frederic Fusier

2007-05-11 05:28:11 EDT

Will continue to work on this improvement in next release...

Comment 4 Frederic Fusier

2007-06-14 09:28:06 EDT

Created attachment 71307 [details]
Proposed patch 

This patch replace the RandomAccessFile usage by a FileInputStream and so can take advantage of the optimization done in DiskIndex while fixing bug 171653...

Comment 5 Frederic Fusier

2007-06-15 09:16:52 EDT

I forgot to precise that posted patch passes all JDT/Core and JDT/UI tests + FullSourceWorkspaceCompleteSearchTests + several massive requests in a 3.3 and Europa full binaries/source workspaces...

Comment 6 Frederic Fusier

2007-06-26 13:29:25 EDT

Released for 3.4M1 in HEAD stream.

Comment 7 Jerome Lanneluc

2007-07-02 05:38:17 EDT

Reopening since this change caused bug 195091.
Patch as been reverted in HEAD.

Comment 8 Frederic Fusier

2007-07-02 09:39:07 EDT

Problem came from the fact that DiskIndex internal chars buffer index was not correctly set at the beginning. Reading of index file signature was invalid and so always generate the excpetion of wrong format hence the indexing job at each startup... :-(

Comment 9 Frederic Fusier

2007-07-03 12:43:15 EDT

Created attachment 72969 [details]
New proposed patch

Comment 10 Frederic Fusier

2007-07-03 12:43:40 EDT

Released for 3.4M1 in HEAD stream.

Comment 11 Frederic Fusier

2007-08-03 09:44:58 EDT

Verified (based on code) for 3.4M1 using build I20070802-0800.