Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[rdf4j-dev] IRI Validation

Hi all,

I want to add (optional) IRI validation to all the parsers. However,
I've run into trouble and hope some of you can help.

My validation fails in the Turtle test suite on localName_with_assigned
_nfc_PN_CHARS_BASE_character_boundaries[1]. You can see the IRI in an
encoded form in the nt file and inline in the ttl file.

The last character of the IRI is U+E01EF, which, as far is I can tell,
is not part of a valid IRI.

RFC3987[2] (IRIs) says the following UCS characters are permitted:
   ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

   iprivate       = %xE000-F8FF / %xF0000-FFFFD / %x100000-10FFFD
 

Also of note is this URL[1], which is also not a valid IRI because an
IRI can only have at most one "#".

Can I get some help interpreting all this and advice on how we should
be resolve this issue?

Thanks,
James

[1] https://w3c.github.io/rdf-tests/turtle/##localName_with_assigned_nfc_PN_CHARS_BASE_character_boundaries
[2] https://tools.ietf.org/html/rfc3987#section-2.2




Back to the top