Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] New Memory Store - Planning

Hi Håvard,

This would be great! Although, transaction support needs to be included
early in the design if you want to have a faster implementation then
the existing MemoryStore. Your indexes should support a multi-root B+
tree to allow versioning and concurrent read/writes to different states
of the store. I'd recommend you take a close look at SailSource
interface and understand it's lifecycle before getting too far in the
design of a new store.

There is no need to have an explicit lookup table for in-memory stores.
The purpose of an id-lookup table, in disk based indexes, is a simple
form of compression to avoid repeating the string value each time.
However, since you will be using Java Objects/points anyway, you
essentially already have a lookup table (memory position maps to a
char[] ;).

My goal this year is to start a similar process for the nativerdf store
as it has poor performance primary due to the lack of support for
versioning in its indexes. So, I'm very interested in how this works
out.

When your ready to share some code, it should be put into a features
branch to allow others to create PRs and other suggestions.

Regards,
James

On Mon, 2017-07-17 at 10:13 +0200, Håvard Ottestad wrote:
> Hi Jeen,
> 
> Thanks for the feedback.
> 
> I would like to improve the current memory store instead, but the way
> it's built with half the logic in the sail source and the other half
> in the value factory makes it really hard to pry apart.
> 
> My main motivation is to get O(log N) or better lookup. The current
> memory store is closer to O(N), which I think also impacts the insert
> performance since it looks like all inserts check with hasStatement()
> first. 
> 
> How about this plan as a middle ground:
> 
>  - Develop new memory store
>  - Release it as experimental
>  - Add transaction support
>  - Beta release with new memory store as default (eg. rename the old
> one)
>  - 3-6 months later full release
> 
> Adding transaction support would bring it in line with the current
> memory store, so feature wise users wouldn't loose anything. Do you
> know of anyone using the transaction support in the current memory
> store today? Especially the read-committed and snapshot isolation?
> And for what purpose?
> 
> As for a memory store that spills to disk, I think that would be
> better to design as a memory mapped disk store where you can disable
> fsync for performance. Does the current disk based store use memory
> mapped files?
> 
> Håvard
> 
> On Mon, Jul 17, 2017 at 12:59 AM, Jeen Broekstra <jeen.broekstra@gmai
> l.com> wrote:
> > Sounds intriguing, but between the Model implementations and the
> > existing in-memory store, where does your approach fit in? From a
> > user perspective I’m thinking that another in-memory option might
> > be overly confusing.
> > 
> > Perhaps you could approach your ideas as improvements/configuration
> > options in the existing store and/or Model classes instead?
> > 
> > Another thought: I’ve long thought that it would be useful to have
> > an in-memory store that can seamlessly switch over to disk-based
> > persistence when the size goes up - another way to think of it is
> > as a fast-access caching layer for the native store. Could your
> > ideas fit in with that kind of project?
> > 
> > Cheers,
> > 
> > Jeen
> > 
> > > On 15 Jul 2017, at 19:04, Håvard M. Ottestad <hmottestad@xxxxxxxx
> > m> wrote:
> > >
> > > Hi,
> > >
> > > I’m look at creating a new Memory Store that is configurable so
> > that it can be optimised for any scenario.
> > >
> > > Here is what I’m thinking of doing:
> > >
> > > - B+ tree indexes (because they support range queries)
> > > - Thread based indexing (async)
> > > - Deletes in diff index (merged when index.size > some max)
> > > - Hashmap index for exists(triple1) queries
> > >
> > > I don’t want to have transactional support. I use transactions
> > with disk based databases, but the memory store I usually only use
> > for embedded purposes that are single threaded.
> > >
> > > Scenarios I want to support:
> > > - read heavy
> > > - write heavy
> > > - transforms (deserialise, query, update, serialise)
> > >
> > > In general, are there any recommendations or requirements that
> > others might have before I get too committed?
> > >
> > > I also know that a lot of triple stores convert IRIs and literals
> > to a hash/integer so that the hash/integer is stored in the indexes
> > and the IRIs/literals are stored in a lookup table. I’m considering
> > doing this too, but it might not be of much benefit unless I
> > migrate to manual memory management using the unsafe library. Any
> > thoughts on using sun.misc.Unsafe?
> > >
> > > Håvard
> > > _______________________________________________
> > > rdf4j-dev mailing list
> > > rdf4j-dev@xxxxxxxxxxx
> > > To change your delivery options, retrieve your password, or
> > unsubscribe from this list, visit
> > > https://dev.eclipse.org/mailman/listinfo/rdf4j-dev
> > 
> > _______________________________________________
> > rdf4j-dev mailing list
> > rdf4j-dev@xxxxxxxxxxx
> > To change your delivery options, retrieve your password, or
> > unsubscribe from this list, visit
> > https://dev.eclipse.org/mailman/listinfo/rdf4j-dev
> > 
> _______________________________________________
> rdf4j-dev mailing list
> rdf4j-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or
> unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/rdf4j-dev


Back to the top