Community
Participate
Working Groups
Created attachment 111083 [details] a test case with an instance document that starts on the same line as the data element The validator assumes the instance document to always start in the next line as the <data> element. The validator should not make assumptions on the formats of the XML. An XML/SML document can be valid without line breaks. See attachment for a test case that has this problem. It is modified from test-resources/acyclic/ValidCycle.xml by removing a line break. error when running validation: org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
Created attachment 111085 [details] This is a test case where the instance document has to begin TWO lines after the data element.
Created attachment 111095 [details] This is a test case where the instance document has to begin TWO lines after the data element.
The same issue occurs for definition documents, btw. The SMLIF editor also creates this case by default; in other words, if you create a new SMLIF file via New > Other... , and then "Add document" an existing file, the newly added document creates an SMLIF document demonstrating this bug.
adding Henry Thompson to cc at his request
The root of the problem is in the DocumentCacheBuilder. In startElement it sets the line number to getLineNumber()+1, but according to the javadoc gLN is an approximation with a set of caveats useful only for diagnostics. Undeterred, DCB proceeds in getElementSource to read the file and use the starting line number set in startElement as the first line of the file. (Note that this becomes an n^2 scaling issue, as the file is read sequentially from the beginning for each by-value encapsulated instance/def document). In a SAX parser like this, the only "safe" implementation I know of is to capture all the nodes as they come by and save them. Of course if the SAX "constraint" is dropped, other possibilities arise. A simple fix for the common case of the UI failing to insert the newline that DCB assumes always exists after the <smlif:data> element (and likely for smlif:base64data) would be to insert one programmatically during creation of the markup.