Bug 245403 - validator does not allow instance document to start on the same line as the data element
Summary: validator does not allow instance document to start on the same line as the d...
Status: NEW
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Cosmos (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: David Whiteman CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-08-27 11:49 EDT by Hubert Leung CLA
Modified: 2012-01-03 13:47 EST (History)
2 users (show)

See Also:


Attachments
a test case with an instance document that starts on the same line as the data element (3.88 KB, text/xml)
2008-08-27 11:49 EDT, Hubert Leung CLA
no flags Details
This is a test case where the instance document has to begin TWO lines after the data element. (1.06 KB, text/xml)
2008-08-27 12:01 EDT, Hubert Leung CLA
no flags Details
This is a test case where the instance document has to begin TWO lines after the data element. (1.09 KB, text/xml)
2008-08-27 12:59 EDT, Hubert Leung CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hubert Leung CLA 2008-08-27 11:49:30 EDT
Created attachment 111083 [details]
a test case with an instance document that starts on the same line as the data element

The validator assumes the instance document to always start in the next line as the <data> element.  The validator should not make assumptions on the formats of the XML.  An XML/SML document can be valid without line breaks.  

See attachment for a test case that has this problem.  It is modified from test-resources/acyclic/ValidCycle.xml by removing a line break.  

error when running validation:
org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.
Comment 1 Hubert Leung CLA 2008-08-27 12:01:50 EDT
Created attachment 111085 [details]
This is a test case where the instance document has to begin TWO lines after the data element.
Comment 2 Hubert Leung CLA 2008-08-27 12:59:05 EDT
Created attachment 111095 [details]
This is a test case where the instance document has to begin TWO lines after the data element.
Comment 3 John Arwe CLA 2009-04-28 12:47:31 EDT
The same issue occurs for definition documents, btw.

The SMLIF editor also creates this case by default; in other words, if you create a new SMLIF file via New > Other... , and then "Add document" an existing file, the newly added document creates an SMLIF document demonstrating this bug.
Comment 4 John Arwe CLA 2009-04-28 13:21:01 EDT
adding Henry Thompson to cc at his request
Comment 5 John Arwe CLA 2009-04-28 18:47:15 EDT
The root of the problem is in the DocumentCacheBuilder.  In startElement it sets the line number to getLineNumber()+1, but according to the javadoc gLN is an approximation with a set of caveats useful only for diagnostics.  Undeterred, DCB proceeds in getElementSource to read the file and use the starting line number set in startElement as the first line of the file.  (Note that this becomes an n^2 scaling issue, as the file is read sequentially from the beginning for each by-value encapsulated instance/def document).

In a SAX parser like this, the only "safe" implementation I know of is to capture all the nodes as they come by and save them.  Of course if the SAX "constraint" is dropped, other possibilities arise.

A simple fix for the common case of the UI failing to insert the newline that DCB assumes always exists after the <smlif:data> element (and likely for smlif:base64data) would be to insert one programmatically during creation of the markup.