245403 – validator does not allow instance document to start on the same line as the data element

Bug 245403 - validator does not allow instance document to start on the same line as the data element

Summary: validator does not allow instance document to start on the same line as the d...

Status:	NEW

Alias:	None

Product:	z_Archived
Classification:	Eclipse Foundation
Component:	Cosmos (show other bugs)
Version:	unspecified
Hardware:	PC Windows XP

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	David Whiteman
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2008-08-27 11:49 EDT by Hubert Leung
Modified:	2012-01-03 13:47 EST (History)
CC List:	2 users (show)

See Also:

Attachments
a test case with an instance document that starts on the same line as the data element (3.88 KB, text/xml) 2008-08-27 11:49 EDT, Hubert Leung	no flags	Details
This is a test case where the instance document has to begin TWO lines after the data element. (1.06 KB, text/xml) 2008-08-27 12:01 EDT, Hubert Leung	no flags	Details
This is a test case where the instance document has to begin TWO lines after the data element. (1.09 KB, text/xml) 2008-08-27 12:59 EDT, Hubert Leung	no flags	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Hubert Leung

2008-08-27 11:49:30 EDT

Created attachment 111083 [details]
a test case with an instance document that starts on the same line as the data element

The validator assumes the instance document to always start in the next line as the <data> element.  The validator should not make assumptions on the formats of the XML.  An XML/SML document can be valid without line breaks.  

See attachment for a test case that has this problem.  It is modified from test-resources/acyclic/ValidCycle.xml by removing a line break.  

error when running validation:
org.xml.sax.SAXParseException: The markup in the document following the root element must be well-formed.

Comment 1 Hubert Leung

2008-08-27 12:01:50 EDT

Created attachment 111085 [details]
This is a test case where the instance document has to begin TWO lines after the data element.

Comment 2 Hubert Leung

2008-08-27 12:59:05 EDT

Created attachment 111095 [details]
This is a test case where the instance document has to begin TWO lines after the data element.

Comment 3 John Arwe

2009-04-28 12:47:31 EDT

The same issue occurs for definition documents, btw.

The SMLIF editor also creates this case by default; in other words, if you create a new SMLIF file via New > Other... , and then "Add document" an existing file, the newly added document creates an SMLIF document demonstrating this bug.

Comment 4 John Arwe

2009-04-28 13:21:01 EDT

adding Henry Thompson to cc at his request

Comment 5 John Arwe

2009-04-28 18:47:15 EDT

The root of the problem is in the DocumentCacheBuilder.  In startElement it sets the line number to getLineNumber()+1, but according to the javadoc gLN is an approximation with a set of caveats useful only for diagnostics.  Undeterred, DCB proceeds in getElementSource to read the file and use the starting line number set in startElement as the first line of the file.  (Note that this becomes an n^2 scaling issue, as the file is read sequentially from the beginning for each by-value encapsulated instance/def document).

In a SAX parser like this, the only "safe" implementation I know of is to capture all the nodes as they come by and save them.  Of course if the SAX "constraint" is dropped, other possibilities arise.

A simple fix for the common case of the UI failing to insert the newline that DCB assumes always exists after the <smlif:data> element (and likely for smlif:base64data) would be to insert one programmatically during creation of the markup.