Bug 24414 - [runtime] Plug-in registry performance
Summary: [runtime] Plug-in registry performance
Status: RESOLVED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 2.0   Edit
Hardware: PC All
: P3 normal (vote)
Target Milestone: 2.1 M5   Edit
Assignee: Rafael Chaves CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2002-10-04 15:31 EDT by DJ Houghton CLA
Modified: 2003-04-08 08:39 EDT (History)
3 users (show)

See Also:


Attachments
Excel sheet containing detailed data (26.00 KB, application/vnd.ms-excel)
2002-11-19 10:54 EST, Pascal Rapicault CLA
no flags Details
patch for core.runtime (36.17 KB, patch)
2003-01-02 12:24 EST, Rafael Chaves CLA
no flags Details | Diff
patch for core.tests.runtime (5.63 KB, patch)
2003-01-02 12:24 EST, Rafael Chaves CLA
no flags Details | Diff
loading times table for Eclipse M4 and WSAD (664 bytes, text/plain)
2003-01-02 13:07 EST, Rafael Chaves CLA
no flags Details
patches for core.runtime and core.tests.runtime (9.84 KB, patch)
2003-01-08 16:44 EST, Rafael Chaves CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description DJ Houghton CLA 2002-10-04 15:31:35 EDT
Investigate whether it is fesible to store the plug-in registry as a b-tree. 
The registry takes up a ton of space in memory and is (essentially) used only 
sporatically after startup.

Also ensure that PDE can take advantage of any performance enhancments that we 
make. They use a registry mechanism of their own when running a runtime 
workbench.
Comment 1 Pascal Rapicault CLA 2002-11-19 10:54:59 EST
Created attachment 2462 [details]
Excel sheet containing detailed data
Comment 2 Pascal Rapicault CLA 2002-11-19 10:55:44 EST
The tests carried on were NOT saving the all registry on disk.
On a first glance it appears that saving it all is complicated
for the gain that will result. Indeed, most of the space
took by the registry is located in the configuration elements
and configuration properties that are mainly holders on strings
from the plugin.xml files.

Moreover those data are rarely used. A common usage pattern 
for these objects is to be read by a plugin during its 
initialization phase.

Because these objects are full of strings and sometimes useless,
the tests performed consisted in lazily creating them when the plugin 
registry is being read from the cache file.

In order to evaluate the gain in memory and time, several versions 
of this solution have been implemented and tested over different JVMs.
Detailed numbers are available in the attached xls sheet.


1 - The lazy approach
This solution consists in representing in memory only the interesting 
part of the cache file. Although configuration elements and properties 
bytes are read from the file, they are not created in memory. Note 
that skipping the bytes from the file turns out to be less efficient 
than reading and ignoring them.

When configuration elements and properties are required, the information 
is read from the cache file and the objects are created. Once created new 
objects are kept in memory.

This approach is as fast as the reference implementation and allows 
gaining memory on startup. However after a long eclipse session, all 
the plugins might have been activated and the gain in memory decreases 
and can become null compared to the reference implementation.



2 - The lazy approach and string saver
Because some strings of the registry are duplicated (for example 
plugin id, extension point id), we decided to share them.
To share them we put in place our own mechanism relying on a hashtable. 
Thanks to it, all the duplicated strings of the registry are shared.

This approach allows gaining more memory than the lazy approach since 
strings are saved. Moreover this solution guaranty a minimum gain 
of the size of the string saved.



3 - The lazy approach and string interning
The string API provides a built-in mechanism to share strings. Although 
this mechanism is explicitly usable by a programmer, it is internally 
used to share the constants of classes.
So the goal of using string interning is double. First it avoids the handling 
of string sharing (like in the previous version) and second it shares 
the strings that are duplicated between the plugin registry and the 
classes (for example plugin id, view id,etc).



Comments on the string.intern.
Because string.intern is wellkonwn to have scalability problems, we ran 
a test with different jdks to see how it was behaving. (results are in 
the xls sheet attached).

The result is that most of the JDK behaves "correctly" up to 40/50000 strings
which seems to be good enough in our case (the registry contains a bit more
than 50000 strings but there are only 1600 different strings). The main 
performance issue with interning is the growth of the table in which the strings
are kept and unfortunately the java API does not support the setting
of the initial size. 



Although the string intern has some defects, the most promising approach
in saving memory from the registry is in combining the lazy approach and the
string interning. 
This solution has the advantage of spreading the time took by the string.intern
over the various access to the plugin registry, and so avoid the slow down on
registry reading on startup.

If we want to avoid the usage of string intern, then the only solution left 
to use is the basic lazy approach, since the lazy approach and string saving
solution is too slow on startup.
Comment 3 Jeff McAffer CLA 2002-11-24 23:37:17 EST
PDE has a similar problem.  Idealy a solution in coreland would be useful to 
them as well.  see bug 26951
Comment 4 Rafael Chaves CLA 2002-12-09 12:41:45 EST
It seems to me the lazy approach requires changing the API for model classes 
defined in org.eclipse.core.runtime.model as well the registry cache file 
format.

In order to know where to start reading when we need some piece of info that 
was not loaded during the initial read, we need to store the corresponding 
offset in memory. So we need an API for setting/getting offsets in 
PluginModelObject (but that would expose our internal registry cache needs to 
the world), or for setting/getting dynamic properties.

Besides that, the current file format is based on the fact that we read it 
sequentially, so any possible redundancies are avoided by using references to 
previously declared elements. This won't work if we want to directly retrieve 
an specific information for an specific plug-in because the info may not be 
there - and the references are indexes to a table containing all previously 
declared elements in the order they appear. We would need a file offset 
pointing to where the element is actually declared, instead.

Was this your approach in your tests, Pascal? Or am I missing something?

Thanks.
Comment 5 Pascal Rapicault CLA 2002-12-09 13:32:54 EST
You are right about the method to access the position info.
If we want to avoid this access, using reflection there exists
a way to access private methods, but I doubt that this will be accepted
in the implementation...

There is no problem in reading sequentially the file because the 
objects that we want to read are not put into the table. Only objects
that other plugins can reference are put into this table.
Comment 6 Rafael Chaves CLA 2003-01-02 12:24:05 EST
Created attachment 2878 [details]
patch for core.runtime
Comment 7 Rafael Chaves CLA 2003-01-02 12:24:37 EST
Created attachment 2879 [details]
patch for core.tests.runtime
Comment 8 Rafael Chaves CLA 2003-01-02 12:56:27 EST
The patch proposed implements the lazy approach *without* string interning yet. 
Jeff, which approach will we use? Intern everything? Or just element and 
property values?

About the changes:

1) API interface created: org.eclipse.core.runtime.model.ILazyLoadable - to be 
implemented by plug-in model objects that can be lazily loaded - currently, 
only by org.eclipse.core.runtime.model.ExtensionModel.

2) API interface created: org.eclipse.core.runtime.model.IPluginModelLoader - 
to be implemented in a lazy-loading mechanism.

3) API class changed: org.eclipse.core.runtime.model.ExtensionModel - now 
implements ILazyLoadable, and calls an associated plugin model loader (4) 
before reading any lazily loaded state.

4) Implementation class added: 
org.eclipse.core.internal.plugins.ExtensionModelLoader - ensures any lazily 
loadable state is actually loaded in a given extension model.

5) Implementation class added: 
org.eclipse.core.internal.plugins.RegistryCacheLazyReader - subclasses 
RegistryCacheReader in order to skip lazily loaded state (extensions' sub-
elements)

6) Implementation class changed:
org.eclipse.core.internal.plugins.RegistryCacheReader - refactored to make all 
these changes possible.

7) Implementation class changed:
org.eclipse.core.internal.plugins.RegistryCacheWriter - now writes the current 
offset before writing sub-elements data.

8) Implementation class changed:
org.eclipse.core.internal.runtime.InternalPlatform - instantiates a 
RegistryCacheLazyReader (5) instead of a RegistryCacheReader.

9) Added a new test class to test registry lazy-loading

I believe the results Pascal posted are still valid for this implementation (I 
used most of his code). I will add a text file with plug-in registry loading 
and overall startup times for Eclipse M4 and WSAD with and without this patch.

Please review (but do not release nor close this PR before we add string 
interning - or decide not to intern at all).
Comment 9 Rafael Chaves CLA 2003-01-02 13:07:30 EST
Created attachment 2880 [details]
loading times table for Eclipse M4 and WSAD

The proposed implementation opens a DataInputStream chained to a
FileInputStream on the cache registry file each time an extension is fully
loaded. As the numbers show, using an always open RandomAccessFile is a little
more efficient, but I don't know if it is a good idea to keep a file open
during a complete Eclipse session. Any thoughts?
Comment 10 Rafael Chaves CLA 2003-01-08 16:44:22 EST
Created attachment 2927 [details]
patches for core.runtime and core.tests.runtime

Please ignore my comments made on 2003-01-02 and attachments #2878-#2880. 

This patch's approach is more on the conservative side. The only change made to
an API class (PluginModelObject) was needed to allow lazy loading of extensions
marked as read only (a final method was made non-final). However, this
implementation supports lazy loading only for extension objects created by
org.eclipse.core.internal.plugins.InternalFactory.

To maintain startup times close to the current implementation, several public
methods in RegistryCacheReader were turned into private, or final if they were
accessed externally, to eliminate dynamic binding. This was needed because
registry cache reader is acessed every time the elements of a lazily loaded
extension must be fetched, and dynamic binding was needlessly imposing big
perfomance costs.
Comment 11 Jeff McAffer CLA 2003-01-08 17:07:31 EST
Should consider interning for more parts of the plugin.xml (e.g., plugin ids 
are bound to be repeated in code and cross referenced in <requires>).  Look at 
Pascal's original test cases
Comment 12 Rafael Chaves CLA 2003-01-23 10:34:43 EST
Reviewed and released to HEAD.