Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] New Memory Store - Planning

Hi Jerven,

I want to make lookups for when S and P are known to be as fast as when only S is known. And lookups when S, P and O are known should be just as fast too. B+ trees handle this really well, since lookups are O(log n) regardless. Which indexes (eg. SPOC, POSC, OPSC ....) the user wants will be configurable. 

B+ trees also return the data sorted, which is very useful when doing joins. I don't think the rdf4j join algorithm can take advantage of this, but extending them to do merge-join will make joins much faster.

I have been considering your approach, by extending it with multiple layers of hashmaps so that you can do the following when S and P are known:

store.getSubject(S).getPredicate(P).iterator()

However hashmaps in java use arrays to store their data, and these are usually initialised to some size which will mean that it takes up a lot more memory than needed. Hashmaps also aren't sorted.

I think Jena uses this approach for their Model (which is functionally closer to MemoryStore than to RDF4J Model). Jena has an optimisation though which is that the nested hashmaps are arraylists as long as they are small.

Håvard

Back to the top