Re: [rdf4j-dev] IRI Validation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [rdf4j-dev] IRI Validation

From: James Leigh <james.leigh@xxxxxxxxxxxx>
Date: Wed, 31 May 2017 20:37:00 -0400
Delivered-to: rdf4j-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/rdf4j-dev>
List-help: <mailto:rdf4j-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/options/rdf4j-dev>, <mailto:rdf4j-dev-request@eclipse.org?subject=unsubscribe>

On Thu, 2017-06-01 at 09:15 +1000, Jeen Broekstra wrote:
> > 
> > On 1 Jun 2017, at 04:14, James Leigh <james.leigh@xxxxxxxxxxxx>
> > wrote:
> > My validation fails in the Turtle test suite on
> > localName_with_assigned
> > _nfc_PN_CHARS_BASE_character_boundaries[1]. You can see the IRI in
> > an
> > encoded form in the nt file and inline in the ttl file.
> > 
> > The last character of the IRI is U+E01EF, which, as far is I can
> > tell,
> > is not part of a valid IRI.
> When I look at it, it says the last character is U+2FA1D, which is
> allowed. Could be that my editor is messing things up though.

I guess I just needed a second pair of eyes! I have been looking at
code points for too long! I'll have to update our testsuite for this
change:

https://github.com/w3c/rdf-tests/issues/8



> > Also of note is this URL[1], which is also not a valid IRI because
> > an
> > IRI can only have at most one "#”.
> It’s also not a legal URI, because RFC3986 also does not allow more
> than one #. However, in the obsolete RFC2396, it _is_ allowed,
> basically because it enforces no validation on the fragment (which
> is, strictly speaking, not actually part of the URI), and just says
> “any character goes”.  
> 
> So strictly speaking it’s malformed, but my gut feeling is that the
> most graceful way to handle this is to allow it, and simply consider
> the second # part of the fragment id. Perhaps a case for allowing
> different levels of severity in validation? 
> 

I'm including a routine that will auto encode invalid characters, in
which case the second hash will be converted into %23. Although by
default it will consider double hashes a fatal error.

Thanks,
James

References:
- [rdf4j-dev] IRI Validation
  - From: James Leigh
- Re: [rdf4j-dev] IRI Validation
  - From: Jeen Broekstra

Prev by Date: Re: [rdf4j-dev] IRI Validation
Next by Date: [rdf4j-dev] Fwd: Error: Sail is not initialized or has been shut down
Previous by thread: Re: [rdf4j-dev] IRI Validation
Next by thread: [rdf4j-dev] Fwd: Error: Sail is not initialized or has been shut down
Index(es):
- Date
- Thread

Breadcrumbs