Bug 138309 - [index] Optimize index files path storage in DiskIndex and IndexManager
Summary: [index] Optimize index files path storage in DiskIndex and IndexManager
Status: VERIFIED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.2   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: 3.3 M5   Edit
Assignee: Frederic Fusier CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 168354
  Show dependency tree
 
Reported: 2006-04-25 05:30 EDT by Martin Aeschlimann CLA
Modified: 2007-03-28 10:34 EDT (History)
3 users (show)

See Also:


Attachments
illustrates point a.) (164.17 KB, text/html)
2006-04-25 05:32 EDT, Martin Aeschlimann CLA
no flags Details
Illistartes point b.) (same string twice in memory) (7.45 KB, text/html)
2006-04-25 05:32 EDT, Martin Aeschlimann CLA
no flags Details
Illustrates point c.) (116.12 KB, text/html)
2006-04-25 05:35 EDT, Martin Aeschlimann CLA
no flags Details
Interns category names in DiskIndex (3.65 KB, patch)
2007-01-18 14:13 EST, Kent Johnson CLA
no flags Details | Diff
Proposed patch (31.11 KB, patch)
2007-01-22 07:16 EST, Frederic Fusier CLA
no flags Details | Diff
New proposed patch (48.53 KB, patch)
2007-01-23 15:43 EST, Kent Johnson CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Aeschlimann CLA 2006-04-25 05:30:53 EDT
20060425-0010

While profiling the meomory heap I found that the IndexManager is in the top-ten of elements with the biggest retained size (8 % of total memory in my scenario: JDT.UI development workspace, after build, all editors closed)
I didn't spend too much time on that but here's what I see as possible improvements:

a.)
IndexManager.indexLocations.valueTable contains 357 entries. all strings, all in the form
C:\workspaces\eclipse-sh3\.metadata\.plugins\org.eclipse.jdt.core\3123605752.index
C:\workspaces\eclipse-sh3\.metadata\.plugins\org.eclipse.jdt.core\3495272437.index
...
A relative path might be better here. -> '3123605752.index'

b.)
IndexManager.indexStates.keyTable has the same entries again, but different string instances. I guess they should be shared (internalized) with the locations from a.)

c.)
IndexManager.indexes.table.value.diskIndex refer to DiskIndex's (216 instances in my scenario)
DiskIndex.categoryOffsets.keyTable seema ll to contain the same 8 char[]: s, u, p, e, r, R, e, f,   /  f, i, e, l, d, D, e, c, l / m, e, t, h, o, d, R, e, f ....
These char[] should be internalized.

Maybe the same with 
DiskIndex.categoryTables.keyTable
Comment 1 Martin Aeschlimann CLA 2006-04-25 05:32:00 EDT
Created attachment 39391 [details]
illustrates point a.)
Comment 2 Martin Aeschlimann CLA 2006-04-25 05:32:32 EDT
Created attachment 39392 [details]
Illistartes point b.) (same string twice in memory)
Comment 3 Martin Aeschlimann CLA 2006-04-25 05:35:10 EDT
Created attachment 39393 [details]
Illustrates point c.)
Comment 4 Frederic Fusier CLA 2006-04-26 10:06:37 EDT
Note that indexes may be also written in a specific location by other SearchParticipant than JDT one. Current implementation might be improved but for point a) not as simply as using relative path...

However any change in this area is highly risky at this late dvpt stage
=> defer to 3.3
Comment 5 Frederic Fusier CLA 2007-01-12 06:08:35 EST
Reopen for 3.3 as we need index tables optimization to accept bug 168354 fix which surely will increase memory consumption...
Comment 6 Kent Johnson CLA 2007-01-18 14:13:27 EST
Created attachment 57098 [details]
Interns category names in DiskIndex
Comment 7 Frederic Fusier CLA 2007-01-22 07:16:02 EST
Created attachment 57246 [details]
Proposed patch

Additional changes to take into account also point a) and b).
Kent, may you review the changes?
Thanks
Comment 8 Frederic Fusier CLA 2007-01-22 07:50:25 EST
Using yourkit (version 6.0.5) to compare memory consumption before and after patch applied does not show any difference!? Seems to be a bug in Yourkit as computing manually memory size of indexes, indexLocations and indexStates tables show a gain of around 200 bytes per index files and per table...

So, we should save with JKD 1.4.2 around 6,000 bytes (200 x 10 jar files x 3 tables). Better than nothing but definitely not comparable with the 1,604,400 bytes of JDK 1.5 rt.jar index file or the 2,026,329 of JDK 1.6 rt.jar index file (measured with Yourkit) than fix for bug 168354 would add to memory footprint!
Comment 9 Kent Johnson CLA 2007-01-23 15:43:03 EST
Created attachment 57377 [details]
New proposed patch

This patch replaces the OS String used for the indexLocation with an IPath that shares segments with the saved indexes directory, so only the simple fileName is a new segment string.

The indexLocation paths are also shared between the indexStates table and the indexLocations table.

We also save time/space by no longer converting Paths to OSStrings in places like PatternSearchJob.getIndexes(), or creating Paths from indexLocation Strings in places like IndexSelector.getIndexes().

Any remaining conversions from IPath to OSStrings or vice versa are no longer in common search operations, but are in initialization/cleanup operations.
Comment 10 Frederic Fusier CLA 2007-01-24 13:53:39 EST
I've looked at your patch and it looks really nice, thanks :-)

I've also done some measure using Yourkit and here are the retained sizes observed on an Eclipse Full Source workspace :

1) HEAD contents
----------------
IndexManager      -> 5,152,976
 + indexes        -> 4,988,200
 + indexLocations -> 2,904
 + indexStates    -> 41,224

2) Patch
--------
IndexManager      -> 5,140,752
 + indexes        -> 5,027,056
 + indexLocations -> 4,016
 + indexStates    -> 2,904

This corroborate the expected result your were describing :-)

All JDT/Core tests are green, so I'll release the patch tomorrow after having talked about these results with Philippe and also verified performance...
Comment 11 Frederic Fusier CLA 2007-01-25 11:08:31 EST
Released for 3.3 M5 in HEAD stream.
Comment 12 Maxime Daniel CLA 2007-02-06 01:16:17 EST
Verified using build I20070205-1824.
Comment 13 Maxime Daniel CLA 2007-02-06 01:49:01 EST
Verified for 3.3 M5 using build I20070205-1824.
Comment 14 Frederic Fusier CLA 2007-03-26 06:46:56 EDT
Is it OK to backport this fix on R3_2_maintenance stream?
Comment 15 Jerome Lanneluc CLA 2007-03-26 07:56:57 EDT
For now, please just prepare the patch against R3_2_maintenance. We will decide later if this needs to be released to R3_2_maintenance
Comment 16 Frederic Fusier CLA 2007-03-28 10:34:11 EDT
(In reply to comment #15)
> For now, please just prepare the patch against R3_2_maintenance. We will decide
> later if this needs to be released to R3_2_maintenance
> 
See patch attached to bug 171653 which also includes fix for this one: https://bugs.eclipse.org/bugs/attachment.cgi?id=62230