Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] SHACL sail refactor

Hi Jeen,

Thanks for helping out :) The Code isn’t very clean no :(

Some basic concepts that I’ve worked on:

- A need to convert the shacl rules from rdf into a java model with methods that represent the de-normalized rules. Eg. the AST. 

- Based on what has changed in a transaction, we need to generate a plan for how to get data to validate. These are the plan nodes. And the tracking is done with the two memory stores in the connection (first in two hashsets )

- A need to transport data and track where it came from, so we can tell the user that they broke a shacl rule because of some triples added or removed. This is done by the Tuple class.

I don’t want to mix data with rules. Regular SQL databases don’t store their schemas inside the users database. The SHACL sail should support multiple options for loading in rules and options for updating rules. But updating rules should be done explicitly with an explicit command (also a good reason for not having them inside the userdata). Updating rules is not currently supported.

And, btw. I do have a branch where I’m working on supporting sh:datatype, and someone else was working on some cleanup and support for string based restrictions. Can I maybe merge my branch into yours, it’s mostly done, and it also contains cleanup of some of the tests?

Håvard

> On 5 Apr 2018, at 11:32, Jeen Broekstra <jeen.broekstra@xxxxxxxxx> wrote:
> 
> Havard, others,
> 
> I'm currently taking a closer look at the SHACL Sail design and I think there's of lot of things that we can refactor/streamline:
> 
> 1. The initialization interface is overly complex: we shouldn't need to provide a separate Repository object with the shapes data to the sail - it should just use a dedicated named graph for this purpose. Issue logged as https://github.com/eclipse/rdf4j-storage/issues/67 .
> 
> 2. A lot of functionality (factories for shape objects, generating plans for each shape, etc) is currently embedded inside the AST objects. I can live with this (though I would have preferred a better separation of concerns) but it explicitly relies on supplying SailRepository objects, SailRepositoryConnections, etc. where there's no need. This needs to be rigorously simplified. As far as I can tell we effectively keep all the shape data in memory during validation so why stick it in a (in memory) repository whenever we pass it to a Shape AST object? We should just use the Model API here IMHO.
> 
> I've started work on the first issue but I've quickly found that the second is closely related. I know you're currently also working on the SHACL Sail, but I think it's urgent that we get the basic design clean before adding more functionality. Are you ok with me doing some rigorous refactoring and putting up a PR (hopefully mid next week)? I'm working from master under the assumption that we get to break an API in a patch release if that API was explicitly marked as EXPERIMENTAL :)
> 
> Jeen
> _______________________________________________
> rdf4j-dev mailing list
> rdf4j-dev@xxxxxxxxxxx
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/rdf4j-dev


Back to the top