Bug 176187 - SML-IF import locator failure should be a warning, not an error
Summary: SML-IF import locator failure should be a warning, not an error
Status: ASSIGNED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Cosmos (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Jimmy Mohsin CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-03-02 10:37 EST by David Whiteman CLA
Modified: 2012-01-03 13:53 EST (History)
5 users (show)

See Also:


Attachments
changed files for this issue (19.18 KB, application/x-zip-compressed)
2008-10-21 08:32 EDT, Ramesh CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Whiteman CLA 2007-03-02 10:37:13 EST
When the SML-IF import encounters a locator element specifying a remote document that does not exist, it outputs a fatal error message.  We should instead make it a warning, and continue with the import.
Comment 1 Valentina Popescu CLA 2007-04-26 13:46:23 EDT
Cannot be contained in i3; moving to i4
Comment 2 John Arwe CLA 2008-08-28 08:47:24 EDT
This is a spec compliance issue.  3/2008 spec (and current editor's draft) cover this in SMLIF 5.2.2, see final paragraph.

5.2.2 Referenced Documents

Documents that are to be referenced rather than embedded MUST be included as follows:

   1.      If the document is a definition document, the location of the document MUST be included as the content of a model/definitions/document/locator element.
   2.      If the document is an instance document, the location of the document MUST be included as the content of a model/instances/document/locator element. 

SML-IF specifies one way that MAY be used to provide the location of the referenced document, the documentURI element.

An SML-IF consumer MAY choose to locate a referenced document. If an SML-IF consumer chooses not to locate a referenced document or if it attempts to locate the referenced document and this attempt fails, then the SML-IF consumer MUST treat the referenced document as if it is not part of the interchange model. If either of these conditions occurs, the SML-IF consumer SHOULD make its invoker aware of this condition. 
Comment 3 David Whiteman CLA 2008-08-28 09:30:30 EDT
Note that this failure occurs with the import tool, and not the validator.  Therefore, this is a lower priority, as spec compliance issues in the validator itself are more critical to fix.
Comment 4 David Whiteman CLA 2008-10-15 18:43:08 EDT
Assigning to Ramesh to work on after he completes his current work.
Comment 5 Ramesh CLA 2008-10-16 10:37:30 EDT
Hi David,
The 'issue description' doesn't say which SML-IF document could be used to reproduce this issue, anyway as discussed (about this issue) in our previous RM meeting, I am using others/remote-document.xml (i.e. contained in 'validation.tests' project src\test-resources\others\ folder). 

Following are my observations:

I believe the above SML-IF file is a valid one because it confirms to the SML-IF schema but when I tried to import it, it resulted in a SAXException (URL for remote document null indicated in locator is not valid), when I debugged the code I see we are throwing an error in repository-plugin SMLIFFileHandler.java at line:272. The reason is it is expecting/processing an "xlink href" attribute value which is coming null

When I referred SML-IF schema definition (i.e. http://www.w3.org/Submission/sml-if/ I believe this is the proper/current spec which we need to confirm to, please correct me if I need to refer a different version of it) I found following excerpt relating to 'locator' element

"a locator element is used to contain the reference. Syntactically, the content of a locator can be a documentURI element defined by SML-IF or anything else understood by the consumer. Typically it is a URI, an XLink [12], or a Web Services Addressing endpoint reference [9]."

With that I consider following are the proper cases (with out considering 'open content')
1. <locator xlink:href="url"/>
2. <locator>
      <documentURI>url</documentURI>
   </locator>

As of now I am not considering ambiguous cases like like
<locator xlink:href="url">
   <documentURI>url</documentURI>
</locator>

Note: In SMLIFFileHandler.java currently documentURI element is not considered/processed, I will change to handler to look for it to capture url.

Actually I haven't yet got to the point to actual problem (i.e. reproduced the issue). Probably after checking for nullness in the SMLIFFileHanlder.java (at the above mentioned line) and process accordingly then I think actual issue might get reproduced.

Thanks and regards
Ramesh Pokala
Comment 6 David Whiteman CLA 2008-10-16 11:02:20 EDT
Ramesh, we have implemented the SML 1.1 and SML-IF 1.1 specs.  Your link points to the SML-IF 1.0 spec.  Please use these for reference instead:

http://www.w3.org/TR/sml-if/
http://www.w3.org/TR/sml/

After you look at the spec in light of the current issue, let me know if that changes your findings.
Comment 7 Ramesh CLA 2008-10-17 09:00:48 EDT
(In reply to comment #6)

> After you look at the spec in light of the current issue, let me know if that
> changes your findings.
> 

David, I found 2 excerpts from the sml-if 1.1 specs relating to 'locator' element

1. "If the document is being referenced rather than embedded, a locator element is used to contain the reference. The content of a locator can be a documentURI element defined by SML-IF or anything else understood by the SML-IF consumer."

2. "5.2.2 Referenced Documents
Documents that are to be referenced rather than embedded MUST be included as follows:

If the document is a definition document, the location of the document MUST be included as the content of a model/definitions/document/locator element.

If the document is an instance document, the location of the document MUST be included as the content of a model/instances/document/locator element. 

SML-IF specifies one way that MAY be used to provide the location of the referenced document, the documentURI element.

An SML-IF consumer MAY choose to locate a referenced document. If an SML-IF consumer chooses not to locate a referenced document or if it attempts to locate the referenced document and this attempt fails, then the SML-IF consumer MUST treat the referenced document as if it is not part of the interchange model. If either of these conditions occurs, the SML-IF consumer SHOULD make its invoker aware of this condition."


--> With my current knowledge on SML-IF what I understood from the above 2 extracts is that, a reference can be either part of model/*/document/locator or model/*/document/locator/documentURI.

I feel following use-cases are correct
1. <locator>url</locator>
2. <locator>
      <documentURI>url</documentURI>
   </locator>

But not sure about below (and etc)
1. <locator xlink:href="url"/>
2. <locator>
       <documentURI>url1</documentURI>
       url2
   </locator>
3. <locator xlink:href="url1">
       <documentURI>url2</documentURI>
       url3
   </locator>

Let me know your suggestions on this, mean while I will update handler class to process the correct usecases.

Thanks and regards
Ramesh Pokala
Comment 8 John Arwe CLA 2008-10-17 14:17:35 EDT
1. <locator>url</locator>

   Should fail SMLIF schema validation, since mixed=false.

2. <locator>
      <documentURI>url</documentURI>
   </locator>

   Valid

1. <locator xlink:href="url"/>

   Syntactically valid (because of anyAttribute), but not required for spec
   compliance.

2. <locator>
       <documentURI>url1</documentURI>
       url2
   </locator>

   Should fail SMLIF schema validation, since mixed=false.

3. <locator xlink:href="url1">
       <documentURI>url2</documentURI>
       url3
   </locator>

   Should fail SMLIF schema validation, since mixed=false.
Comment 9 Ramesh CLA 2008-10-21 08:32:21 EDT
Created attachment 115684 [details]
changed files for this issue
Comment 10 Ramesh CLA 2008-10-21 08:33:02 EDT
(In reply to comment #8)
> 2. <locator>
>       <documentURI>url</documentURI>
>    </locator>
>    Valid

Thanks John, for pointing out the "mixed" attribute of SML-IF 1.1 specs, I  initially overlooked "mixed" attribute. Now I updated the files to process (only) the above valid scenario during import process.

Hi David,

I have updated following files to provide the fix, I request you to review the changes and let me know your comments if any on the attached files.

File changes in brief:
1. org.eclipse.cosmos.rm.smlif/src/org/eclipse/cosmos/rm/internal/smlif/common/messages.properties (added "importWarningTitle" resource)
2.org.eclipse.cosmos.rm.smlif/src/org/eclipse/cosmos/rm/internal/smlif/common/SMLMessages.java (added "importWarningTitle" constant for the above change)
3.org.eclipse.cosmos.rm.repository/src-core-impl/org/eclipse/cosmos/rm/internal/repository/operations/ImportOperation.java (modified importFromFileSystemFile(...) signature to return list-of-warnings encountered during import i.e. List<RepositoryWarning> object)
4.org.eclipse.cosmos.rm.repository/src-core-impl/org/eclipse/cosmos/rm/internal/repository/operations/FileImportOperation.java (collects the warnings if any during the Import from ImportOperation.java and throws it to the UI to handle)
5.org.eclipse.cosmos.rm.repository/src-core-impl/org/eclipse/cosmos/rm/internal/repository/operations/SMLIFFileHandler.java (added addLocatorError(...), getWarnings(), startElement(...) to comment LOCATOR_ELEMENT handling and added DOCUMENTURL_ELEMENT handling, endElement(...) to comment LOCATOR_ELEMENT handling and added DOCUMENTURL_ELEMENT handling)
6.org.eclipse.cosmos.rm.repository/src-core-impl/org/eclipse/cosmos/rm/internal/repository/core/FileSystemSMLRepository.java (updated addDocument(...) method to add a check to look for '?' symbol in file names to avoid file-creation failure --> is this change correct?)

New file added: 
1.org.eclipse.cosmos.rm.repository/src-core/org/eclipse/cosmos/rm/provisional/repository/exception/RepositoryWarnings.java  --> Class to contain/store warnings came across during import, which is thrown by ImportOperation.java (at the end of import) handled by ImportFromSMLIFWizardPage.java to display the warnings if any

While reproducing the above issue, I came across few other issues which I have pointed out below, they are also rectified/fixed with the changes.

-From CA Network the others/remote-document.xml is not getting downloaded from internet (just for my internal purpose I have added system properties inside SMLIFFileHandler.java retrieveRemoteDocument(...) file to reproduce/fix the issue)
-SMLFileHandler.java was not handling <documentURI> element of <locator> element which is corrected now
-In FileSystemSMLRepository.java, addDocument(...) resulted in a exeption while creating the xml file using downloaded document(becuase of '?' etc characters in the file to be created), which is corrected to add a check (but with this we loose the ?name1=value1&name2=value2 etc extra URL information in case if we need them during export)

Thanks and regards
Ramesh Pokala
Comment 11 Ramesh CLA 2008-10-21 09:08:18 EDT
(In reply to comment #10)
> While reproducing the above issue, I came across few other issues which I have
> pointed out below, they are also rectified/fixed with the changes.
> -From CA Network the others/remote-document.xml is not getting downloaded from
> internet (just for my internal purpose I have added system properties inside
> SMLIFFileHandler.java retrieveRemoteDocument(...) file to reproduce/fix the
> issue)
This change contained in the mentioned method is not required for this issue fix, but somehow the remote-document is not getting downloaded from inside CA Network, is there any way we can write java-code to plug the proxy configuration parameters by retrieving them from System
I referred http://java.sun.com/j2se/1.5.0/docs/api/index.html?java/net/ProxySelector.html but it didn't work, instead of getting CA proxy server parameters it is getting DIRECT connection
> -In FileSystemSMLRepository.java, addDocument(...) resulted in a exeption while
> creating the xml file using downloaded document(becuase of '?' etc characters
> in the file to be created), which is corrected to add a check (but with this we
> loose the ?name1=value1&name2=value2 etc extra URL information in case if we
> need them during export)
For instance, in others/remote-document.xml there is a remote-URL "http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml?root=Technology_Project&amp;view=co" present, during the import we download/collect the contents of this file from internet and at the end of import we try create a file using the above URL i.e.  <target folder>+"viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml?root=Technology_Project&amp;view=co" (this resulted in exception), I now updated to create file neglecting its query-parameters (i.e. "?root=Technology_Project&amp;view=co") and create filename using <target folder>+"viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml", my question is do we need the neglected queryparameter during export to generate the output similar to imported SML-IF file?
> Thanks and regards
> Ramesh Pokala

Comment 12 David Whiteman CLA 2008-10-21 09:20:57 EDT
(In reply to comment #11)
> This change contained in the mentioned method is not required for this issue
> fix, but somehow the remote-document is not getting downloaded from inside CA
> Network, is there any way we can write java-code to plug the proxy
> configuration parameters by retrieving them from System
> I referred
> http://java.sun.com/j2se/1.5.0/docs/api/index.html?java/net/ProxySelector.html
> but it didn't work, instead of getting CA proxy server parameters it is getting
> DIRECT connection

I'm not sure how to handle the proxy information; that is something you might want to research on the Internet, but it might be something we can't really address at this point.

> For instance, in others/remote-document.xml there is a remote-URL
> "http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml?root=Technology_Project&amp;view=co"
> present, during the import we download/collect the contents of this file from
> internet and at the end of import we try create a file using the above URL i.e.
>  <target
> folder>+"viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml?root=Technology_Project&amp;view=co"
> (this resulted in exception), I now updated to create file neglecting its
> query-parameters (i.e. "?root=Technology_Project&amp;view=co") and create
> filename using <target
> folder>+"viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml",
> my question is do we need the neglected queryparameter during export to
> generate the output similar to imported SML-IF file?

I would say for now that we keep the code you wrote to drop the query parameters, and that we create a defect report that we need to preserve that information in the metadata, but I don't think it's a high priority defect.

Comment 13 David Whiteman CLA 2008-10-21 10:55:09 EDT
from John:

The portion of a URI following the ? is the query component (RFC 3986).  The query component does need to be correctly handled as part of URI processing.  It probably has to be escaped in this context in order to get the right bits coming out.  The rules on escaping in the various layers of the stack are confusing enough to me that I don't have a particularly tight grip on them unfortunately.
Comment 14 Jimmy Mohsin CLA 2008-11-04 11:53:31 EST
Candidate Whiteman,

Methinks this RM bug is thine property...

Jimmy
Comment 15 David Whiteman CLA 2008-11-04 16:51:00 EST
Ramesh, do we have a defect for the missing data after "?".  Can we just change the fix to escape the "?" into &#63; which is the HTML version?  We would just need to make sure to convert it back on export.
Comment 16 Ramesh CLA 2008-11-06 02:12:44 EST
(In reply to comment #15)
> Ramesh, do we have a defect for the missing data after "?".  Can we just change
> the fix to escape the "?" into &#63; which is the HTML version?  We would just
> need to make sure to convert it back on export.
> 
Hi David, sorry for the late reply, I haven't opened any issue for the above identified problem
Actually we can't even use "&#63;" in place of "?" because "&" is also one of the invalid characters.
Note: The 9 filename invalid characters at least in Windows are  \ / : * ? " < > |

In our case (I assume) we might need to handle all those except "/" (i.e. because we create folders for those strings that are delimited-by/prior-to last "/" character).

I have started going through the RFC 3986 specs ( http://www.ietf.org/rfc/rfc3986.txt) whether it can help me in solving the identified problem. Meanwhile if you think I can open a new bugzilla for this and work separately, I can do so, please suggest.
-Ramesh Pokala
Comment 17 Ramesh CLA 2008-11-06 05:48:18 EST
Hi David,
I have gone through the RFC 3986 specs (not completely), it talks much about different sections of a URI (i.e. syntax, relative/absolute URI's, reserved/unreserved characters, when to encode things like that and etc) but as such I haven't find any specific information that we require (at most of the places it says it is about the implementations how they process etc).

Anyway I tried using java.net.URLEncoder.encode(<<queryPartOfURL>>, "UTF-8") it did encode the special characters but the problem is that the resulted file is not getting opened by any editor though it looks like the file has the required content). 
I tested this with others\remote-document.xml file to encode query portion of http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.cosmos/tests/resource-modeling/org.eclipse.cosmos.rm.validation.tests/src/test-resources/others/sampleLocatorFile.xml?root=Technology_Project&amp;view=co (i.e. sampleLocatorFile.xml?root=Technology_Project&amp;view=co), it is encoded as   "sampleLocatorFile.xml%3Froot%3DTechnology_Project%26amp%3Bview%3Dco".

I feel it would be better we write our own encode and decode methods to encode special characters with our own replacement strings (something like '&' to '__amp__' or '%26' etc) so that they can be checked and decoded during export process. Plese let me know your ideas/suggestions.
Thanks
-Ramesh
Comment 18 David Whiteman CLA 2008-11-06 06:39:56 EST
(In reply to comment #17)
> I feel it would be better we write our own encode and decode methods to encode
> special characters with our own replacement strings (something like '&' to
> '__amp__' or '%26' etc) so that they can be checked and decoded during export
> process. Plese let me know your ideas/suggestions.
> Thanks
> -Ramesh

Thanks for your thorough investigation, Ramesh.  I would be fine with the above solution. 

Comment 19 David Whiteman CLA 2008-11-06 06:44:15 EST
(In reply to comment #16)
> Meanwhile if you think I can open a new bugzilla for this
> and work separately, I can do so, please suggest.
> -Ramesh Pokala
> 

I think that would be best, considering encoding the special characters in a URL is a different problem than whether we issue a warning vs. error for a URL not found.
Comment 20 John Arwe CLA 2008-11-06 12:27:13 EST
(In reply to comment #11)

I resisted answering this question directly so as not to pollute the bug w/ more chatter, but Ramesh prevailed upon me to see that it might be more useful to others than I thought.

There is at least one existing bug open that apparently happens on the Sun JVM but not the IBM JVM.  You might want to search the open bugs and see if there is an intersect.  

The other thing I can think of would be to make a local copy of the <locator>-containing smlif file and change the URI in the locator from http... to file (or localhost) and point to your local copy of the file that way; it's not perfect but it might let you distinguish between JVM issues and your company's security sw.  You might also talk to others at your company involved in external projects like COSMOS, who most likely would have had similar issues to what you are seeing and may have found a bypass. 

Comment 21 Ramesh CLA 2008-11-07 02:03:09 EST
> I think that would be best, considering encoding the special characters in a
> URL is a different problem than whether we issue a warning vs. error for a URL
> not found.
> 
David, I opened bug #254563 for the above problem