Bug 194698 - structured XSD annotations are mangled by formatter.
Summary: structured XSD annotations are mangled by formatter.
Status: VERIFIED FIXED
Alias: None
Product: WTP Source Editing
Classification: WebTools
Component: wst.xml (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P2 normal with 1 vote (vote)
Target Milestone: 3.0.1   Edit
Assignee: Nick Sandonato CLA
QA Contact: David Williams CLA
URL:
Whiteboard:
Keywords: contributed, plan
Depends on:
Blocks: 226463
  Show dependency tree
 
Reported: 2007-06-27 20:06 EDT by Travis CLA
Modified: 2008-08-19 15:22 EDT (History)
5 users (show)

See Also:
for.work.things: review+
valentinbaciu: review+
thatnitind: review+


Attachments
Patch to address observing XSD whitespace rules (13.63 KB, patch)
2008-03-26 14:53 EDT, Nick Sandonato CLA
no flags Details | Diff
update testcase (13.87 KB, patch)
2008-04-18 13:28 EDT, Nick Sandonato CLA
no flags Details | Diff
patch (19.23 KB, patch)
2008-04-23 16:47 EDT, Nick Sandonato CLA
no flags Details | Diff
patch with xsd junits (26.01 KB, patch)
2008-05-02 11:41 EDT, Nick Sandonato CLA
no flags Details | Diff
updatedpatch.txt (28.23 KB, patch)
2008-07-18 18:34 EDT, Amy Wu CLA
bjorn.freeman-benson: iplog+
Details | Diff
patch (29.95 KB, patch)
2008-07-21 13:28 EDT, Nick Sandonato CLA
bjorn.freeman-benson: iplog+
Details | Diff
patch to correct test case (2.26 KB, patch)
2008-07-29 16:10 EDT, Nick Sandonato CLA
bjorn.freeman-benson: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Travis CLA 2007-06-27 20:06:49 EDT
I have XML files that have annotations, in which the annotation is an XML document.  When I run the XML formatter, the annotation is formatted without regard to the schema of the annotations.  This means that where my whitespace is significant (in xs:string elements), the value is changed by by the formatter, adding additional whitespace.

I need the formatter to either completely ignore annotations (not attempt to format them) OR to honor the schema of the annotations, and preserve whitespace where it is declared to be significant.   

Examples:

Here are my annotations:  

<element name="Customerfirstname" type="string" minOccurs="0" maxOccurs="1">
<annotation>
<appinfo source="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata">
<pasi:PeopleSoftAttributeTypeMetadata xmlns:pasi="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata">
<pasi:Getter>getCustomerfirstname</pasi:Getter>
<pasi:Setter>setCustomerfirstname</pasi:Setter>
</pasi:PeopleSoftAttributeTypeMetadata>
</appinfo>
</annotation>
</element>

This is what the formatter does with them.  Note the introduction of whitespace. Getter and Setter are xs:string elements.     



			<element name="Customerfirstname" type="string"
				minOccurs="0" maxOccurs="1">
				<annotation>
					<appinfo
						source="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata">
						<pasi:PeopleSoftAttributeTypeMetadata
							xmlns:pasi="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata">
							<pasi:Getter>
								getCustomerfirstname
							</pasi:Getter>
							<pasi:Setter>
								setCustomerfirstname
							</pasi:Setter>
						</pasi:PeopleSoftAttributeTypeMetadata>
					</appinfo>
				</annotation>
			</element>
Comment 1 Nitin Dahyabhai CLA 2007-07-06 03:26:40 EDT
Travis, could you attach a sample XML file and schema for us to test a fix against?
Comment 2 David Carver CLA 2007-10-05 16:46:22 EDT
(In reply to comment #1)
> Travis, could you attach a sample XML file and schema for us to test a fix
> against?
> 

Nitind, you can download a good set of schemas to use and can generate XML Instances from these by downloading the files from:

http://www.starstandard.org/index.php?n=SIGXMLSTAR5.XMLSchemas

To use this, pick something like ProcessRetailDeliveryReporting.xsd, and generate a sample instance.  Correct the errors that appear (there should only be a hand full of them). Next, Ctrl+Shift+F to format the file.  The reformatting will cause validation errors to occur on any field that has be split apart which uses xsd:normalizedString which doesn't allow for spaces to occur in front or at the end of the elements that use that field.

The formatting should be intelligent enough to recognize this if a grammar is attached.
Comment 3 David Carver CLA 2007-11-18 02:13:56 EST
(In reply to comment #2)
> To use this, pick something like ProcessRetailDeliveryReporting.xsd, and
> generate a sample instance.  Correct the errors that appear (there should only
> be a hand full of them). Next, Ctrl+Shift+F to format the file.  The
> reformatting will cause validation errors to occur on any field that has be
> split apart which uses xsd:normalizedString which doesn't allow for spaces to
> occur in front or at the end of the elements that use that field.
> 
> The formatting should be intelligent enough to recognize this if a grammar is
> attached.

Clarification on this, xsd:normalized string doesn't allow tabs, linefeeds, or carriage returns.  xsd:token doesn't allow any of those, plus leading and trailing spaces.  Regardless, it would be nice if the XML formatter took in to consideration the grammar constraints that an XML instance has when doing any type of formatting.


Comment 4 Amy Wu CLA 2007-12-19 22:54:53 EST
Have you tried checking the "Preserve whitespace in tags with PCDATA content" formatting preference?

Preferences > Web and XML > XML Files > Source
Comment 5 David Carver CLA 2007-12-20 00:19:14 EST
(In reply to comment #4)
> Have you tried checking the "Preserve whitespace in tags with PCDATA content"
> formatting preference?
> 
> Preferences > Web and XML > XML Files > Source
> 

For my particular case that doesn't work.  If an element has xsd:token specified as it's data type, it still adds CRLFs to the file.  Basically the current Formatter is not Grammar aware.  If an XML instance has a grammar attached to it, it would be nice to have the formatter be aware of the restrictions that the grammar puts on the formatting of the XML.
Comment 6 David Carver CLA 2007-12-20 09:48:34 EST
(In reply to comment #5)
> (In reply to comment #4)
> > Have you tried checking the "Preserve whitespace in tags with PCDATA content"
> > formatting preference?
> > 
> > Preferences > Web and XML > XML Files > Source
> > 
> 
> For my particular case that doesn't work.  If an element has xsd:token
> specified as it's data type, it still adds CRLFs to the file.  Basically the
> current Formatter is not Grammar aware.  If an XML instance has a grammar
> attached to it, it would be nice to have the formatter be aware of the
> restrictions that the grammar puts on the formatting of the XML.
> 

A further clarification on why this is needed.  The current Formatter will make previously valid XML documents that rely on a schemas, possibly become invalid.  This is because it introduces formatting into elements that have restrictions on what type of data can be included and how that data should look.
Comment 7 Amy Wu CLA 2008-01-29 15:27:05 EST
Reassigning formatting bugs to Nick
Comment 8 Nick Sandonato CLA 2008-03-26 14:53:26 EDT
Created attachment 93662 [details]
Patch to address observing XSD whitespace rules

This patch includes code to observe whitespace rules defined by XSD types when formatting XML documents.

Also included is a new JUnit test case that tests the various whitespace rules (collapse, preserve, replace).
Comment 9 Nick Sandonato CLA 2008-04-18 13:28:50 EDT
Created attachment 96627 [details]
update testcase

Patch updated a test case and also handles the scenario where the XSD element declaration has no type. This was causing some problems when formatting certain nodes in XSDs.
Comment 10 Nick Sandonato CLA 2008-04-23 16:47:42 EDT
Created attachment 97316 [details]
patch

Added a fix for Bug 226463 as it required the content model property to retrieve a whitespace facet before it would work correctly.

Will also mark this as a blocker to Bug 226463.
Comment 11 Valentin Baciu CLA 2008-04-29 17:21:57 EDT
Hi Nick, as per our conversation please have a look at the org.eclipse.xsd.XSDSimpleTypeDefinition.getEffectiveWhiteSpaceFacet() method. A JUnit test for the new XSD content model adapter property would not hurt either :-) If you decide to write one, the home for it would be org.eclipse.wst.xsd.core.tests.
Comment 12 Nick Sandonato CLA 2008-05-01 14:17:46 EDT
Pushing back to 3.0 RC1 due to time.
Comment 13 Nick Sandonato CLA 2008-05-02 11:41:34 EDT
Created attachment 98454 [details]
patch with xsd junits

Simplified calculating the whitespace facet using getEffectiveWhitespaceFacet as suggested by Valentin.  Also added a junit test to the XSD core tests plugin to test the dataType getProperty for whitespace facets.
Comment 14 David Carver CLA 2008-05-20 09:23:19 EDT
Nitin, did you mean 3.0 RC2 instead of 3.0 M2? :)
Comment 15 Nitin Dahyabhai CLA 2008-05-20 14:44:00 EDT
Yes
Comment 16 David Williams CLA 2008-05-23 10:57:50 EDT
This is part of a mass move of bugs targeted to RC2 to re-target to RC3, since we are passed RC2 except for blocking bugs. Please correct, if I have erred. 
Comment 17 Nitin Dahyabhai CLA 2008-06-04 05:19:38 EDT
Deferring to 3.0.1.

Amy and Valentin, please review these changes early in 3.0.1.  Apparently this one slipped through the cracks for both of you.
Comment 18 Amy Wu CLA 2008-07-18 18:34:38 EDT
Created attachment 107874 [details]
updatedpatch.txt

By the time I had a chance to review this patch, it was out of date thanks to some other recent changes to the formatter.  I've attached an updated patch that should  now apply cleanly to the current code stream.

While reviewing this patch, I did notice 2 issues:
1. If a file was already formatted, it was still being marked as dirty.  I noticed this was because of the new method replaceSpaces().  I added an additional check in that method so that if the spaces are already there, there's no need for a replace edit.

2. For xsd:token, whitespace is always supposed to be trimmed, but thanks to the recent fix of the "clear all blank lines" preference, the whitespace is not trimmed when the preference is unchecked.  I added a new whitespace mode, IGNOREANDTRIM for this case.  When in this mode, the "clear all blank lines" preference is ignored and the whitespace is always trimmed.

Other than that, the patch looked good, Nick.  The unit test especially helped in flagging the new "clear all blank lines" issue.  Since I did muck around with the code though, could you please review my review/patch to make sure I didn't mess anything up?
Comment 19 Valentin Baciu CLA 2008-07-21 11:39:31 EDT
Hi Nick, overall the XSD core patch looks good. Thanks for adding a test. 

Here are some minor observations:
- there's a duplicate import for org.eclipse.wst.xml.core.internal.contentmodel.CMNodeList;
- you may want to add an the XML declaration to the test schema XSDWhitespace.xsd
- I generated an XML instance from the schema (Generate->XML, checked the generate optional elements/attributes), then tried to format that file. When I hit Ctrl+Shift+F repeatedly I observe a new line introduced after the text in 

	<extended-collapse>extended-collapse</extended-collapse>

Sometimes it looks like that, and after the next format it looks like this:

	<extended-collapse>extended-collapse
	</extended-collapse>

Can you please confirm this last issue? 
Comment 20 Nick Sandonato CLA 2008-07-21 13:28:01 EDT
Created attachment 107974 [details]
patch

This patch includes everything from the previous patches including some of the cleanup from comment 19. I also fixed the behavior noticed caused by the end-tag trying to format itself after the text content had already been formatted. The previous patch should be obsolete now. Thanks for the help in cleaning up these last few cases.
Comment 21 Valentin Baciu CLA 2008-07-21 14:51:17 EDT
Thanks Nick, the last patch looks good.

Just my closing $0.02: if there is not already a preference for the formatter being driven by the schema (whitespace facets, etc), I would think that it may be worthwhile having one. The formatter is reaching here into the XML processor's territory and some users may not like that done without the option to turn it off if they so desire.
Comment 22 Amy Wu CLA 2008-07-23 13:02:39 EDT
latest patch looks good to me too
Comment 23 Nick Sandonato CLA 2008-07-23 13:07:42 EDT
Requesting component lead review for 3.0.1.
Comment 24 Nitin Dahyabhai CLA 2008-07-24 00:22:12 EDT
Applied
Comment 25 Nick Sandonato CLA 2008-07-29 16:08:43 EDT
Reopening as the testcase for this bug fails.

The scenario is that normalizedStrings replace each whitespace character with a single space. The submitted test file uses \r\n as a line delimiter, so this would be replaced by two spaces when formatted. If a machine is set to replace the line delimiters with the system's default, this could cause the expected results to differ from the actual results.
Comment 26 Nick Sandonato CLA 2008-07-29 16:10:50 EDT
Created attachment 108694 [details]
patch to correct test case

This patch is to correct the test case by normalizing the test case's document content line delimiters to always be \n so that the normalizedString replaces the correct number of whitespace characters and the expected results match the actual.
Comment 27 Nitin Dahyabhai CLA 2008-07-30 07:29:44 EDT
test updated
Comment 28 Nick Sandonato CLA 2008-08-19 15:22:16 EDT
Verified in 3.0.1-20080818032401.