Bug 210994 - [prov] Space efficiency of OrderedProperties
Summary: [prov] Space efficiency of OrderedProperties
Status: RESOLVED WONTFIX
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Incubator (show other bugs)
Version: 3.4   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: equinox.incubator-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2007-11-26 17:26 EST by John Arthorne CLA
Modified: 2017-07-03 03:16 EDT (History)
2 users (show)

See Also:


Attachments
Simple implementation of OrderedMap backed by array (14.98 KB, patch)
2007-11-26 17:27 EST, John Arthorne CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Arthorne CLA 2007-11-26 17:26:13 EST
When doing analysis with YourKit, I'm seeing a large number of LinkedHashMap, LinkedHashMap$Entry, etc.  Most of these trace back to the OrderedProperties class. We should investigate a more space-efficient implementation of OrderedProperties. Nearly all instances of OrderedProperties have in the range of 0-4 elements, so a simple string array is probably sufficient.
Comment 1 John Arthorne CLA 2007-11-26 17:27:05 EST
Created attachment 83818 [details]
Simple implementation of OrderedMap backed by array

Work in progress....
Comment 2 Jeff McAffer CLA 2007-11-26 20:38:02 EST
Can we reduce the cases where ordering is important.  I don't actually recall the reasoning for maintaining the ordering.  I'm assuming it has to do with reading and writing with as little perturbation as possible?  Does tihs really matter?  when?
Comment 3 Dave Stevenson CLA 2007-11-27 16:44:17 EST
Experience has provided a couple of motivations for maintaining order of the properties in metadata:

1). Platform dependencies in hashes. Using a collection that is not explicitly order preserving leads to changes when moving/reading/writing metadata on different platforms (and possibly on different jre's, although I haven't seen that). This is not a huge issue, but does complicate writing junits that test reading an input xml file (generated on some platform), writing it out, reading back in an expecting the 'same' result. Obviously one can write a order independent check, but it's a bit of a nuisance.

2). Authored metadata. When people use tools (either simple text editors or, hopefully, authoring tools for creating metadata (which we don't expect to happen much immediately, but will become important) they often craft the metadata based on the way their perceptions of the importance (e.g. significant properties first, others later; or groups of related properties; etc). Having these perceptions violated by saving/rereading or switching platforms can result in major angst. Obviously custom tools can use auxiliary data to present a consistent order to workaround the problem.

However, based on the correct assumption that the size of properties collections in the metadata is typically small, a trade-off can be made between lookup time and space as John is suggesting (and prototyping). This could be easily done with and adaptive ordered map class whose implementation is a resizeable array (with linear search for lookup) for size<=n and a linked hash map for size > n.

Since we decide to change a class of values that were being represented with capabilities to be properties, the properties collections may grow a little bit and search efficiency may become a little more important.
Comment 4 John Arthorne CLA 2007-11-27 17:40:38 EST
Yep, agree we need to maintain entry order.  Space-efficiency of Maps has come up a lot in Eclipse. Bascially the Map implementations in Java have excellent performance lookup speed, but terrible space performance. LinkedHashMap in particular has an overhead of an extra 40 bytes per entry object. With our use of lots of little maps in p2 this adds up... about 400KB of extra overhead when running the admin UI with a single metadata repository and two profiles containing the SDK. In many areas of Eclipse we've move to array-based maps, which have much better space efficiency at the cost of slightly less performant lookup. On a map with ~10 entries the speed cost rarely matters. For example IResource session properties and markers, equinox preferences, job properties, and many other areas use array-based maps.  My patch is just a copy of our ObjectMap from elsewhere in core with a few modifications to conform to the previous OrderedProperties implementation.
Comment 5 John Arthorne CLA 2008-01-04 13:07:49 EST
I'm throwing away this change, because the size of OrderedProperties isn't as much of an issue when we flush the entire metadata when not in use. We can reopen in the future if this turns up in profiling as a major concern.