Community
Participate
Working Groups
I have XML files that have annotations, in which the annotation is an XML document. When I run the XML formatter, the annotation is formatted without regard to the schema of the annotations. This means that where my whitespace is significant (in xs:string elements), the value is changed by by the formatter, adding additional whitespace. I need the formatter to either completely ignore annotations (not attempt to format them) OR to honor the schema of the annotations, and preserve whitespace where it is declared to be significant. Examples: Here are my annotations: <element name="Customerfirstname" type="string" minOccurs="0" maxOccurs="1"> <annotation> <appinfo source="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata"> <pasi:PeopleSoftAttributeTypeMetadata xmlns:pasi="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata"> <pasi:Getter>getCustomerfirstname</pasi:Getter> <pasi:Setter>setCustomerfirstname</pasi:Setter> </pasi:PeopleSoftAttributeTypeMetadata> </appinfo> </annotation> </element> This is what the formatter does with them. Note the introduction of whitespace. Getter and Setter are xs:string elements. <element name="Customerfirstname" type="string" minOccurs="0" maxOccurs="1"> <annotation> <appinfo source="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata"> <pasi:PeopleSoftAttributeTypeMetadata xmlns:pasi="http://www.ibm.com/xmlns/prod/websphere/j2ca/peoplesoft/metadata"> <pasi:Getter> getCustomerfirstname </pasi:Getter> <pasi:Setter> setCustomerfirstname </pasi:Setter> </pasi:PeopleSoftAttributeTypeMetadata> </appinfo> </annotation> </element>
Travis, could you attach a sample XML file and schema for us to test a fix against?
(In reply to comment #1) > Travis, could you attach a sample XML file and schema for us to test a fix > against? > Nitind, you can download a good set of schemas to use and can generate XML Instances from these by downloading the files from: http://www.starstandard.org/index.php?n=SIGXMLSTAR5.XMLSchemas To use this, pick something like ProcessRetailDeliveryReporting.xsd, and generate a sample instance. Correct the errors that appear (there should only be a hand full of them). Next, Ctrl+Shift+F to format the file. The reformatting will cause validation errors to occur on any field that has be split apart which uses xsd:normalizedString which doesn't allow for spaces to occur in front or at the end of the elements that use that field. The formatting should be intelligent enough to recognize this if a grammar is attached.
(In reply to comment #2) > To use this, pick something like ProcessRetailDeliveryReporting.xsd, and > generate a sample instance. Correct the errors that appear (there should only > be a hand full of them). Next, Ctrl+Shift+F to format the file. The > reformatting will cause validation errors to occur on any field that has be > split apart which uses xsd:normalizedString which doesn't allow for spaces to > occur in front or at the end of the elements that use that field. > > The formatting should be intelligent enough to recognize this if a grammar is > attached. Clarification on this, xsd:normalized string doesn't allow tabs, linefeeds, or carriage returns. xsd:token doesn't allow any of those, plus leading and trailing spaces. Regardless, it would be nice if the XML formatter took in to consideration the grammar constraints that an XML instance has when doing any type of formatting.
Have you tried checking the "Preserve whitespace in tags with PCDATA content" formatting preference? Preferences > Web and XML > XML Files > Source
(In reply to comment #4) > Have you tried checking the "Preserve whitespace in tags with PCDATA content" > formatting preference? > > Preferences > Web and XML > XML Files > Source > For my particular case that doesn't work. If an element has xsd:token specified as it's data type, it still adds CRLFs to the file. Basically the current Formatter is not Grammar aware. If an XML instance has a grammar attached to it, it would be nice to have the formatter be aware of the restrictions that the grammar puts on the formatting of the XML.
(In reply to comment #5) > (In reply to comment #4) > > Have you tried checking the "Preserve whitespace in tags with PCDATA content" > > formatting preference? > > > > Preferences > Web and XML > XML Files > Source > > > > For my particular case that doesn't work. If an element has xsd:token > specified as it's data type, it still adds CRLFs to the file. Basically the > current Formatter is not Grammar aware. If an XML instance has a grammar > attached to it, it would be nice to have the formatter be aware of the > restrictions that the grammar puts on the formatting of the XML. > A further clarification on why this is needed. The current Formatter will make previously valid XML documents that rely on a schemas, possibly become invalid. This is because it introduces formatting into elements that have restrictions on what type of data can be included and how that data should look.
Reassigning formatting bugs to Nick
Created attachment 93662 [details] Patch to address observing XSD whitespace rules This patch includes code to observe whitespace rules defined by XSD types when formatting XML documents. Also included is a new JUnit test case that tests the various whitespace rules (collapse, preserve, replace).
Created attachment 96627 [details] update testcase Patch updated a test case and also handles the scenario where the XSD element declaration has no type. This was causing some problems when formatting certain nodes in XSDs.
Created attachment 97316 [details] patch Added a fix for Bug 226463 as it required the content model property to retrieve a whitespace facet before it would work correctly. Will also mark this as a blocker to Bug 226463.
Hi Nick, as per our conversation please have a look at the org.eclipse.xsd.XSDSimpleTypeDefinition.getEffectiveWhiteSpaceFacet() method. A JUnit test for the new XSD content model adapter property would not hurt either :-) If you decide to write one, the home for it would be org.eclipse.wst.xsd.core.tests.
Pushing back to 3.0 RC1 due to time.
Created attachment 98454 [details] patch with xsd junits Simplified calculating the whitespace facet using getEffectiveWhitespaceFacet as suggested by Valentin. Also added a junit test to the XSD core tests plugin to test the dataType getProperty for whitespace facets.
Nitin, did you mean 3.0 RC2 instead of 3.0 M2? :)
Yes
This is part of a mass move of bugs targeted to RC2 to re-target to RC3, since we are passed RC2 except for blocking bugs. Please correct, if I have erred.
Deferring to 3.0.1. Amy and Valentin, please review these changes early in 3.0.1. Apparently this one slipped through the cracks for both of you.
Created attachment 107874 [details] updatedpatch.txt By the time I had a chance to review this patch, it was out of date thanks to some other recent changes to the formatter. I've attached an updated patch that should now apply cleanly to the current code stream. While reviewing this patch, I did notice 2 issues: 1. If a file was already formatted, it was still being marked as dirty. I noticed this was because of the new method replaceSpaces(). I added an additional check in that method so that if the spaces are already there, there's no need for a replace edit. 2. For xsd:token, whitespace is always supposed to be trimmed, but thanks to the recent fix of the "clear all blank lines" preference, the whitespace is not trimmed when the preference is unchecked. I added a new whitespace mode, IGNOREANDTRIM for this case. When in this mode, the "clear all blank lines" preference is ignored and the whitespace is always trimmed. Other than that, the patch looked good, Nick. The unit test especially helped in flagging the new "clear all blank lines" issue. Since I did muck around with the code though, could you please review my review/patch to make sure I didn't mess anything up?
Hi Nick, overall the XSD core patch looks good. Thanks for adding a test. Here are some minor observations: - there's a duplicate import for org.eclipse.wst.xml.core.internal.contentmodel.CMNodeList; - you may want to add an the XML declaration to the test schema XSDWhitespace.xsd - I generated an XML instance from the schema (Generate->XML, checked the generate optional elements/attributes), then tried to format that file. When I hit Ctrl+Shift+F repeatedly I observe a new line introduced after the text in <extended-collapse>extended-collapse</extended-collapse> Sometimes it looks like that, and after the next format it looks like this: <extended-collapse>extended-collapse </extended-collapse> Can you please confirm this last issue?
Created attachment 107974 [details] patch This patch includes everything from the previous patches including some of the cleanup from comment 19. I also fixed the behavior noticed caused by the end-tag trying to format itself after the text content had already been formatted. The previous patch should be obsolete now. Thanks for the help in cleaning up these last few cases.
Thanks Nick, the last patch looks good. Just my closing $0.02: if there is not already a preference for the formatter being driven by the schema (whitespace facets, etc), I would think that it may be worthwhile having one. The formatter is reaching here into the XML processor's territory and some users may not like that done without the option to turn it off if they so desire.
latest patch looks good to me too
Requesting component lead review for 3.0.1.
Applied
Reopening as the testcase for this bug fails. The scenario is that normalizedStrings replace each whitespace character with a single space. The submitted test file uses \r\n as a line delimiter, so this would be replaced by two spaces when formatted. If a machine is set to replace the line delimiters with the system's default, this could cause the expected results to differ from the actual results.
Created attachment 108694 [details] patch to correct test case This patch is to correct the test case by normalizing the test case's document content line delimiters to always be \n so that the normalizedString replaces the correct number of whitespace characters and the expected results match the actual.
test updated
Verified in 3.0.1-20080818032401.