Bug 103839 - Format of variablesAndContainers.dat doesn't scale well
Summary: Format of variablesAndContainers.dat doesn't scale well
Status: CLOSED FIXED
Alias: None
Product: JDT
Classification: Eclipse Project
Component: Core (show other bugs)
Version: 3.1   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: 3.2 M5   Edit
Assignee: Jerome Lanneluc CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2005-07-14 11:52 EDT by Keith W. Campbell CLA
Modified: 2007-02-22 09:23 EST (History)
1 user (show)

See Also:


Attachments
proposed patch (16.33 KB, patch)
2005-11-14 13:40 EST, Keith W. Campbell CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Keith W. Campbell CLA 2005-07-14 11:52:04 EDT
There is lots of repetition in variablesAndContainers.dat which contributes to
slow startup and lengthy workspace shutdown times.

I have workspaces where this file approaches 30M. Even on a 2.66GHz machine with
2GB of physical memory it takes about 45 seconds to start the
org.eclipse.jdt.core plugin.

In a workspace with N projects JRE_CONTAINER is stored N+1 times.

In a workspace with P plugins which depend on Q other plugins (on average) there
are P*Q classpathentry elements. Each copy includes accessrule elements. Surely
one copy (instead of Q on average) should suffice.
Comment 1 Keith W. Campbell CLA 2005-07-14 12:05:04 EDT
Sorry, I was mistaken: there are (only) N copies of JRE_CONTAINER, not N+1,
so we only have to remove N-1 copies.  :-)
Comment 2 Keith W. Campbell CLA 2005-11-14 13:40:26 EST
Created attachment 29895 [details]
proposed patch

Here is a patch that improves the format of variablesAndContainers.dat.
Rather than storing as (much repeated) XML, the data are prefixed by
keys that identify repeating values. The result is a much more compact
file that can be read and written much more quickly. Data gathered
from medium-sized sample workspaces follows.

Workspace A with 148 plugin projects:
  Load:     5,089 ms	->     114 ms	 (44.5 times faster)
  Save:     1,219 ms	->	67 ms	 (18.1 times faster)
  File: 4,927,707 bytes -> 112,746 bytes (43.7 times smaller)

Workspace B with 323 plugin projects:
  Load:     5,891 ms	->	78 ms	 (75.5 times faster)
  Save:     1,308 ms	->	52 ms	 (25.0 times faster)
  File: 5,115,547 bytes -> 155,509 bytes (32.9 times smaller)
Comment 3 Philipe Mulet CLA 2005-12-08 04:39:18 EST
Let's consider inclusion for M5.

Going even furhter: sharing individual data within entries might be even more valuable. Thinking for instance of access rules.
Comment 4 Keith W. Campbell CLA 2005-12-09 10:05:31 EST
Access rules for a given project should all be distinct and projects
normally don't have overlapping package names so there should be no
duplicate AccessRule objects to be worried about.

I did have a mechanism at one point to handle things like that, but
it affected performance and if my assumptions above are valid there
would be no benefit.
Comment 5 Jerome Lanneluc CLA 2006-01-09 09:11:54 EST
Thanks very much for the patch. Released it to HEAD with minor edits (in particular, I added back the check that avoid leaking containers for no longer existing projects).
Comment 6 Frederic Fusier CLA 2006-02-14 06:39:27 EST
Code verified for 3.2 M5 using build I20060214-0010.
Comment 7 Keith W. Campbell CLA 2007-02-22 09:23:59 EST
Closing.