Re: [rdf4j-dev] New Memory Store

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [rdf4j-dev] New Memory Store - Planning

From: Håvard Ottestad <hmottestad@xxxxxxxxx>
Date: Mon, 17 Jul 2017 10:13:45 +0200
Delivered-to: rdf4j-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/rdf4j-dev>
List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>

Hi Jeen,

Thanks for the feedback.

I would like to improve the current memory store instead, but the way it's built with half the logic in the sail source and the other half in the value factory makes it really hard to pry apart.

My main motivation is to get O(log N) or better lookup. The current memory store is closer to O(N), which I think also impacts the insert performance since it looks like all inserts check with hasStatement() first.

How about this plan as a middle ground:

- Develop new memory store

- Release it as experimental

- Add transaction support

- Beta release with new memory store as default (eg. rename the old one)

- 3-6 months later full release

Adding transaction support would bring it in line with the current memory store, so feature wise users wouldn't loose anything. Do you know of anyone using the transaction support in the current memory store today? Especially the read-committed and snapshot isolation? And for what purpose?

As for a memory store that spills to disk, I think that would be better to design as a memory mapped disk store where you can disable fsync for performance. Does the current disk based store use memory mapped files?

Håvard

On Mon, Jul 17, 2017 at 12:59 AM, Jeen Broekstra <jeen.broekstra@xxxxxxxxx> wrote:

Sounds intriguing, but between the Model implementations and the existing in-memory store, where does your approach fit in? From a user perspective I’m thinking that another in-memory option might be overly confusing.

Perhaps you could approach your ideas as improvements/configuration options in the existing store and/or Model classes instead?

Another thought: I’ve long thought that it would be useful to have an in-memory store that can seamlessly switch over to disk-based persistence when the size goes up - another way to think of it is as a fast-access caching layer for the native store. Could your ideas fit in with that kind of project?

Cheers,

Jeen

> On 15 Jul 2017, at 19:04, Håvard M. Ottestad <hmottestad@xxxxxxxxx> wrote:
>
> Hi,
>
> I’m look at creating a new Memory Store that is configurable so that it can be optimised for any scenario.
>
> Here is what I’m thinking of doing:
>
> - B+ tree indexes (because they support range queries)
> - Thread based indexing (async)
> - Deletes in diff index (merged when index.size > some max)
> - Hashmap index for exists(triple1) queries
>
> I don’t want to have transactional support. I use transactions with disk based databases, but the memory store I usually only use for embedded purposes that are single threaded.
>
> Scenarios I want to support:
> - read heavy
> - write heavy
> - transforms (deserialise, query, update, serialise)
>
> In general, are there any recommendations or requirements that others might have before I get too committed?
>
> I also know that a lot of triple stores convert IRIs and literals to a hash/integer so that the hash/integer is stored in the indexes and the IRIs/literals are stored in a lookup table. I’m considering doing this too, but it might not be of much benefit unless I migrate to manual memory management using the unsafe library. Any thoughts on using sun.misc.Unsafe?
>
> Håvard

> _______________________________________________
> rdf4j-dev mailing list
> rdf4j-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/rdf4j-dev

_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev

Follow-Ups:
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: James Leigh

References:
- [rdf4j-dev] New Memory Store - Planning
  - From: Håvard M. Ottestad
- Re: [rdf4j-dev] New Memory Store - Planning
  - From: Jeen Broekstra

Prev by Date: Re: [rdf4j-dev] New Memory Store - Planning
Next by Date: Re: [rdf4j-dev] New Memory Store - Planning
Previous by thread: Re: [rdf4j-dev] New Memory Store - Planning
Next by thread: Re: [rdf4j-dev] New Memory Store - Planning
Index(es):
- Date
- Thread

Breadcrumbs