Re: [rdf4j-dev] New Memory Store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [rdf4j-dev] New Memory Store - Planning

From: Håvard Ottestad <hmottestad@xxxxxxxxx>
Date: Mon, 17 Jul 2017 11:42:34 +0200
Delivered-to: rdf4j-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/rdf4j-dev>
List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>

Hi Jerven,

I want to make lookups for when S and P are known to be as fast as when only S is known. And lookups when S, P and O are known should be just as fast too. B+ trees handle this really well, since lookups are O(log n) regardless. Which indexes (eg. SPOC, POSC, OPSC ....) the user wants will be configurable.

B+ trees also return the data sorted, which is very useful when doing joins. I don't think the rdf4j join algorithm can take advantage of this, but extending them to do merge-join will make joins much faster.

I have been considering your approach, by extending it with multiple layers of hashmaps so that you can do the following when S and P are known:

store.getSubject(S).getPredicate(P).iterator()

However hashmaps in java use arrays to store their data, and these are usually initialised to some size which will mean that it takes up a lot more memory than needed. Hashmaps also aren't sorted.

I think Jena uses this approach for their Model (which is functionally closer to MemoryStore than to RDF4J Model). Jena has an optimisation though which is that the nested hashmaps are arraylists as long as they are small.

Håvard

Follow-Ups:
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: Jerven Tjalling Bolleman

References:
- [rdf4j-dev] New Memory Store - Planning
  - From: Håvard M. Ottestad
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: Jerven Tjalling Bolleman

Prev by Date: Re: [rdf4j-dev] New Memory Store - Planning
Next by Date: Re: [rdf4j-dev] New Memory Store - Planning
Previous by thread: Re: [rdf4j-dev] New Memory Store - Planning
Next by thread: Re: [rdf4j-dev] New Memory Store - Planning
Index(es):
- Date
- Thread

Breadcrumbs