Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[rdf4j-dev] shac, statistics on classes etc
  • From: "Bart Hanssens (BOSA)" <bart.hanssens@xxxxxxxxxxxx>
  • Date: Thu, 27 Apr 2023 12:03:51 +0000
  • Accept-language: en-US
  • Arc-authentication-results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=bosa.fgov.be; dmarc=pass action=none header.from=bosa.fgov.be; dkim=pass header.d=bosa.fgov.be; arc=none
  • Arc-message-signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=ITglMOq92tYIHtgDmD6PKmIpZ5AQdbnN0V6L7SeKRLU=; b=Rpx5iH6D3jJ3U1rAXTnBXQ/yIpHt10sgtzd/RLUGo3K6v+inbYzQiS8ILCIOcSP6merSLe/WkV8lHxIe1VgZWPQVQPXLA/+HHzGaqGXRBgIxhOJUxdjZYuMEFpCTiqlpTFWbrUoZ83frt6Aie88CoZcgPTD5vj+b3cIUvZZ6v5wE04XqO/E3M+BcFuW2IQeoSJUYUvXuyjQnIGCF4aGdyxdm9DYtUjNn11Y/8+6r17JD1OqyKfcaJLkeZu1dQFoT0E4DmYMW6g3siolSzkOTE16npzBBLror2XDEc0hN+tZu8TRNBQ7Q9TBCUeJkklOf4nsDNJv8oUtnNHiTHcxdEw==
  • Arc-seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=W+4em5R3T0AiaOQQCP0EyLJuwQIUv2wFvZdi8+AR3ucV3EMqc00aJJdAN1jkAbb5JI5MQKC4Uak/cZYgext1FjwIuF2DGHMDxJuP0EDYONjvY1yFr3SsQLfRDBKkA7QWnxei5US69LdWiaPi29L16Qkg0qjqT/3FS6d1ClsKC3siL95Oq1xbBrlQAfF3H3McMSkYDduvISfnvORqOD0DLe189Rt9uJ/kRgo+iBfdLxEsCks4jgJJqlT6f+iYLYNMtMTyvPnrICXGFeIhK3CZSnsaWHzxMEey77E0NG6cJq/l+gHmbruc3Zxa1Xb2FJcpPTwIGBdcdTq9xHdpp3xkMQ==
  • Delivered-to: rdf4j-dev@xxxxxxxxxxx
  • List-archive: <https://www.eclipse.org/mailman/private/rdf4j-dev/>
  • List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
  • List-subscribe: <https://www.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
  • List-unsubscribe: <https://www.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>
  • Thread-index: Adl4/8CV7896ZwozQN+6BbzQ5OfvXA==
  • Thread-topic: shac, statistics on classes etc

Hi,

 

Just a quick note and some thoughts.

 

I’m developing a stand-alone SHACL validator, nothing fancy, which is to be integrated in my data.gov.be toolchain

https://github.com/Fedict/shaclvalidator

 

Of course the SHACL part works like charm, thanks Håvard 😉

Only a few minor issues that will either be solved in 4.3 (severity level),

or are arguably issues with the SHACL files on semic.eu (name on nodeshape, and nodeshapes with empty shacl:property)

 

I was wondering if it would be hard (or interesting for other people) to collect statistics on

  1. number of times a shape did _not_ have validation issues , or how many times a shape matched in total
  2. number of different classes/properties/object values in a dataset

 

Use case for (a) is mainly a metric for data quality (shape violations divided by total),

while (b) is useful for harmonizing data (eg reducing differences) but probably useful for optimizing queries / data storage as well.

 

For the time being I’m (ab)using data cubes for publishing the statistics in TTL, but perhaps there is a better vocabulary.

And I’m guessing some data stores already collect some of this data.

 

Happy to look into it myself, though hints on how to get started would be appreciated 😊

 

 

Best regards,

 

Bart

 

 


Back to the top