Community
Participate
Working Groups
On 3.1M6, I ran filemon.exe while loading M6 into an empty workspace. I captured filemon's read log, and my eye directly fell on <eclipse>/configuration/org.eclipse.core.runtime/.mainData.9 On disk, on my M6 installation (which has one extra tiny plugin), this file measures 702,669. When I exported filemon's log, and processed the "read" numbers, I saw that in total, for startup, TableReader reads 5,162,288 bytes from .mainData.9. And this is just for going into an empty workspace into Resource perspective. After studying the code in TableReader, I noticed a new TableReader is created for each registry access, and it open the mainData file, and creates a new InputStream on that (as a result, hundreds of InputStreams are created and destructed just during startup). To get the actual data, a given number of bytes are skipped on the InputStream to jump to an offset where the extension point is actually located in the registry cache. A better solution is to use a RandomAccessFile (see http://java.sun.com/docs/books/tutorial/essential/io/rafs.html for argumentation). I propose TableReader.<clinit> to open mainData as a random access file and keep the file open for the duration of the platform's execution. Then, each registry access would synchronize on the raf and get its entry at the proper offset. Result is less I/O, and fewer JNI calls. I do understand that when using the currently applied "lazy registry loading" technique, the registry cache will be in the OS disk cache, so no expensive disk accesses are really made for subsequent registry accesses. However, opening/reading/closing an InputStream is always slower than using a static RandomAccessFile, I think. The API is very similar, in fact almost identical in its use, so a quick verification using RandomAccessFile is not hard. Even better is for the registry to save the registry in another way by generating java bytecodes on the fly. Instead of serializing the registry into a couple of binary files, save it into a set of Java classes that implement the registry directly, as Java class (in the form of binary Java bytecodes). No file I/O is done at all. The platform classloader will do the lazy loading for us in this case, by loading the required classes on demand. The platform is littered with various registries, and each has its own serialization model. A common model should be designed (org.eclipse.registry?) that provides the option for coordinated efforts to optimize registry loading. If you like, I could prototype the binary class solution for the extension registy, and prove/disprove the benefits.
There is probably some room for optimization here. Perhaps recognizing the special character of startup. Some additional thoughts. - Pascal benched RandomAccessFile and found it to be quite slow. Perhaps there is a better way of using them? - The registry is not just lazily loaded but also flushed. How would that work with the class approach? - How does the class approach work for representing trees? I.e., how many classes are there? What are the methods? Who instantiates them (if anyone), ... (I'm curious, tell me more!) - another approach that we thought of was to keep a pool of say 10 regular input streams open. We would always toss the one that was furthest along in favour of one positioned earlier. Then when you need to read something, you look in the pool for one that is nearest (but before) the position you need to read. Take it out of the pool, do your reading and then put it back (or toss it). We'd have to try it to see if the cache management overhead is worth it. - A related note, we need to investigate how to handle these caches when running from a shared drive. We may want to cache the cache locally or turn it off.
I figured Pascal tried RandomAccessFile already, as he has done, what is it, some 15 iterations now, with each one a bit faster than the previous. I am surprised RandomAccessFile should be slower. Is this on all VMs? I thought of the pool of inputstreams also, I just thought of it as having one element in it, not 10. The class approach is best modeled in your mind by doing this thought experiment: write the registry cache by hand in Java source. What would it look like then? Once you have this 'prototype', when you serialize the registry, generate similar code (preferably in bytecodes directly). If you use weak references in the connections between the tree elements, the instances will eventually be collected. In theory, even the classes should be unloaded, at some point, but this depends on the VM being used. Of course, the problem is those looooooong names everyone uses for their plugins and extension points. When you use classes with static String fields, those will be interned by the VM, rather than duplicated on the heap. That keeps the cost minimal. Some VMs have trouble with performance of user-defined interned Strings, but those in class files are well dealt with.
This has been fixed in the 3.2 - we now keep the streams open and re-use-them. Also BufferedRandomInputStream is added - a combination of the random access and buffering. It is not perfect yet in terms of performance, but works reasonably well. (As Pascal specified above, RandomAccessFile is unbuffered and in our case doesn't provide much of a benefit.) Just tried with M6 and we read about 137K out of the 805K main table on startup. On a side note, I would be careful about using classes to store data. In the recent startup performance tests, it seemed that class loading is a rather expensive operation and represents a major portion of the headless startup time. Loading an "average" class takes about 0.3ms on my computer; reading and processing 30K table takes 1.5ms. Seems good so far, but loading a little more complicated classes often takes 10 - 20ms (e.g., 10ms for the RegistryObjectManager which is not all that complex).
fix was dropped in M5 from comment 3 in bug 121766. *** This bug has been marked as a duplicate of 121766 ***