Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] IRI Validation

On Thu, 2017-06-01 at 09:15 +1000, Jeen Broekstra wrote:
> > 
> > On 1 Jun 2017, at 04:14, James Leigh <james.leigh@xxxxxxxxxxxx>
> > wrote:
> > My validation fails in the Turtle test suite on
> > localName_with_assigned
> > _nfc_PN_CHARS_BASE_character_boundaries[1]. You can see the IRI in
> > an
> > encoded form in the nt file and inline in the ttl file.
> > 
> > The last character of the IRI is U+E01EF, which, as far is I can
> > tell,
> > is not part of a valid IRI.
> When I look at it, it says the last character is U+2FA1D, which is
> allowed. Could be that my editor is messing things up though.

I guess I just needed a second pair of eyes! I have been looking at
code points for too long! I'll have to update our testsuite for this
change:

https://github.com/w3c/rdf-tests/issues/8



> > Also of note is this URL[1], which is also not a valid IRI because
> > an
> > IRI can only have at most one "#”.
> It’s also not a legal URI, because RFC3986 also does not allow more
> than one #. However, in the obsolete RFC2396, it _is_ allowed,
> basically because it enforces no validation on the fragment (which
> is, strictly speaking, not actually part of the URI), and just says
> “any character goes”.  
> 
> So strictly speaking it’s malformed, but my gut feeling is that the
> most graceful way to handle this is to allow it, and simply consider
> the second # part of the fragment id. Perhaps a case for allowing
> different levels of severity in validation? 
> 

I'm including a routine that will auto encode invalid characters, in
which case the second hash will be converted into %23. Although by
default it will consider double hashes a fatal error.

Thanks,
James


Back to the top