Community
Participate
Working Groups
Investigate whether it is fesible to store the plug-in registry as a b-tree. The registry takes up a ton of space in memory and is (essentially) used only sporatically after startup. Also ensure that PDE can take advantage of any performance enhancments that we make. They use a registry mechanism of their own when running a runtime workbench.
Created attachment 2462 [details] Excel sheet containing detailed data
The tests carried on were NOT saving the all registry on disk. On a first glance it appears that saving it all is complicated for the gain that will result. Indeed, most of the space took by the registry is located in the configuration elements and configuration properties that are mainly holders on strings from the plugin.xml files. Moreover those data are rarely used. A common usage pattern for these objects is to be read by a plugin during its initialization phase. Because these objects are full of strings and sometimes useless, the tests performed consisted in lazily creating them when the plugin registry is being read from the cache file. In order to evaluate the gain in memory and time, several versions of this solution have been implemented and tested over different JVMs. Detailed numbers are available in the attached xls sheet. 1 - The lazy approach This solution consists in representing in memory only the interesting part of the cache file. Although configuration elements and properties bytes are read from the file, they are not created in memory. Note that skipping the bytes from the file turns out to be less efficient than reading and ignoring them. When configuration elements and properties are required, the information is read from the cache file and the objects are created. Once created new objects are kept in memory. This approach is as fast as the reference implementation and allows gaining memory on startup. However after a long eclipse session, all the plugins might have been activated and the gain in memory decreases and can become null compared to the reference implementation. 2 - The lazy approach and string saver Because some strings of the registry are duplicated (for example plugin id, extension point id), we decided to share them. To share them we put in place our own mechanism relying on a hashtable. Thanks to it, all the duplicated strings of the registry are shared. This approach allows gaining more memory than the lazy approach since strings are saved. Moreover this solution guaranty a minimum gain of the size of the string saved. 3 - The lazy approach and string interning The string API provides a built-in mechanism to share strings. Although this mechanism is explicitly usable by a programmer, it is internally used to share the constants of classes. So the goal of using string interning is double. First it avoids the handling of string sharing (like in the previous version) and second it shares the strings that are duplicated between the plugin registry and the classes (for example plugin id, view id,etc). Comments on the string.intern. Because string.intern is wellkonwn to have scalability problems, we ran a test with different jdks to see how it was behaving. (results are in the xls sheet attached). The result is that most of the JDK behaves "correctly" up to 40/50000 strings which seems to be good enough in our case (the registry contains a bit more than 50000 strings but there are only 1600 different strings). The main performance issue with interning is the growth of the table in which the strings are kept and unfortunately the java API does not support the setting of the initial size. Although the string intern has some defects, the most promising approach in saving memory from the registry is in combining the lazy approach and the string interning. This solution has the advantage of spreading the time took by the string.intern over the various access to the plugin registry, and so avoid the slow down on registry reading on startup. If we want to avoid the usage of string intern, then the only solution left to use is the basic lazy approach, since the lazy approach and string saving solution is too slow on startup.
PDE has a similar problem. Idealy a solution in coreland would be useful to them as well. see bug 26951
It seems to me the lazy approach requires changing the API for model classes defined in org.eclipse.core.runtime.model as well the registry cache file format. In order to know where to start reading when we need some piece of info that was not loaded during the initial read, we need to store the corresponding offset in memory. So we need an API for setting/getting offsets in PluginModelObject (but that would expose our internal registry cache needs to the world), or for setting/getting dynamic properties. Besides that, the current file format is based on the fact that we read it sequentially, so any possible redundancies are avoided by using references to previously declared elements. This won't work if we want to directly retrieve an specific information for an specific plug-in because the info may not be there - and the references are indexes to a table containing all previously declared elements in the order they appear. We would need a file offset pointing to where the element is actually declared, instead. Was this your approach in your tests, Pascal? Or am I missing something? Thanks.
You are right about the method to access the position info. If we want to avoid this access, using reflection there exists a way to access private methods, but I doubt that this will be accepted in the implementation... There is no problem in reading sequentially the file because the objects that we want to read are not put into the table. Only objects that other plugins can reference are put into this table.
Created attachment 2878 [details] patch for core.runtime
Created attachment 2879 [details] patch for core.tests.runtime
The patch proposed implements the lazy approach *without* string interning yet. Jeff, which approach will we use? Intern everything? Or just element and property values? About the changes: 1) API interface created: org.eclipse.core.runtime.model.ILazyLoadable - to be implemented by plug-in model objects that can be lazily loaded - currently, only by org.eclipse.core.runtime.model.ExtensionModel. 2) API interface created: org.eclipse.core.runtime.model.IPluginModelLoader - to be implemented in a lazy-loading mechanism. 3) API class changed: org.eclipse.core.runtime.model.ExtensionModel - now implements ILazyLoadable, and calls an associated plugin model loader (4) before reading any lazily loaded state. 4) Implementation class added: org.eclipse.core.internal.plugins.ExtensionModelLoader - ensures any lazily loadable state is actually loaded in a given extension model. 5) Implementation class added: org.eclipse.core.internal.plugins.RegistryCacheLazyReader - subclasses RegistryCacheReader in order to skip lazily loaded state (extensions' sub- elements) 6) Implementation class changed: org.eclipse.core.internal.plugins.RegistryCacheReader - refactored to make all these changes possible. 7) Implementation class changed: org.eclipse.core.internal.plugins.RegistryCacheWriter - now writes the current offset before writing sub-elements data. 8) Implementation class changed: org.eclipse.core.internal.runtime.InternalPlatform - instantiates a RegistryCacheLazyReader (5) instead of a RegistryCacheReader. 9) Added a new test class to test registry lazy-loading I believe the results Pascal posted are still valid for this implementation (I used most of his code). I will add a text file with plug-in registry loading and overall startup times for Eclipse M4 and WSAD with and without this patch. Please review (but do not release nor close this PR before we add string interning - or decide not to intern at all).
Created attachment 2880 [details] loading times table for Eclipse M4 and WSAD The proposed implementation opens a DataInputStream chained to a FileInputStream on the cache registry file each time an extension is fully loaded. As the numbers show, using an always open RandomAccessFile is a little more efficient, but I don't know if it is a good idea to keep a file open during a complete Eclipse session. Any thoughts?
Created attachment 2927 [details] patches for core.runtime and core.tests.runtime Please ignore my comments made on 2003-01-02 and attachments #2878-#2880. This patch's approach is more on the conservative side. The only change made to an API class (PluginModelObject) was needed to allow lazy loading of extensions marked as read only (a final method was made non-final). However, this implementation supports lazy loading only for extension objects created by org.eclipse.core.internal.plugins.InternalFactory. To maintain startup times close to the current implementation, several public methods in RegistryCacheReader were turned into private, or final if they were accessed externally, to eliminate dynamic binding. This was needed because registry cache reader is acessed every time the elements of a lazily loaded extension must be fetched, and dynamic binding was needlessly imposing big perfomance costs.
Should consider interning for more parts of the plugin.xml (e.g., plugin ids are bound to be repeated in code and cross referenced in <requires>). Look at Pascal's original test cases
Reviewed and released to HEAD.