Bug 207078 - new path representation
Summary: new path representation
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 5.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-10-22 15:04 EDT by Chris Recoskie CLA
Modified: 2020-09-04 15:24 EDT (History)
17 users (show)

See Also:


Attachments
Proposed interface. Requires Java 5. (22.66 KB, patch)
2007-10-22 15:14 EDT, Chris Recoskie CLA
no flags Details | Diff
updated patch (21.42 KB, patch)
2007-10-26 15:27 EDT, Chris Recoskie CLA
no flags Details | Diff
Updated patch. (145.77 KB, patch)
2007-12-13 12:34 EST, Chris Recoskie CLA
recoskie: review+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Chris Recoskie CLA 2007-10-22 15:04:58 EDT
As we have been looking at enabling CDT with EFS support, we've noticed a few issues.  Some notes are here:  http://wiki.eclipse.org/PTP/designs/remote/EFS.  In summary although URIs are useful for accessing a file on a remote system, they are not very good for doing path manipulation.

We also want to have code that is largely agnostic about where a path is coming from, not have duplicate APIs for dealing with both IPaths for local files and URIs for remote files, etc.

We have determined that there is a need for a new path representation which bridges local (native) paths and URIs.

In order to provide interoperability with EFS this new representation will have to be able to convert to a valid URI, but this should only be done when interacting directly with EFS itself, otherwise information may be lost.

Requirements:
    * Track the OS the original path was created for, and be able to extract the original path.
    * Convert the path to another OS
    * Convert the path to be relative to another machine (most likely with a different root directory and potentially on a different OS).
    * Distinguish between local, remote, and local-relative-to-remote paths.
    * Provides a toURI() method
    * Provides utility functions for path manipulation (append, get at segment, get the file extension, etc.) similar to what is contained in IPath.
    
I have a draft of an interface, IUniversalPath, which fulfils these requirements.  Greg, Jason, and myself have iterated on this a couple of times, and now I'd like to get some more feedback on it before we start using it.  I will attach the interface to this bug for review.  I'm currently working on a concrete implementation (UniversalHierarchicalPath) in the meantime.
Comment 1 Chris Recoskie CLA 2007-10-22 15:14:13 EDT
Created attachment 80897 [details]
Proposed interface.  Requires Java 5.
Comment 2 Doug Schaefer CLA 2007-10-22 15:15:46 EDT
Do we need a whole new interface or can we get buy with a utility that converts URIs to IPath's?

Also, make sure whatever you do to remember that parent nodes in the URI don't necessarily map to parent nodes in the file system. In particular, folders in the virtual file system may or may not map to folders in the file system, and even if they do map to real folders, they may contain files that they don't on the file system. That's why I'd prefer to hide the real file system as much as possible.
Comment 3 Chris Recoskie CLA 2007-10-22 15:20:48 EDT
(In reply to comment #2)
> Do we need a whole new interface or can we get buy with a utility that converts
> URIs to IPath's?

No, because an IPath will lose information such as what machine the URI points to, for example.
Comment 4 Sergey Prigogin CLA 2007-10-22 15:36:58 EDT
Why doesn't IUniversalPath extend IPath or URI? URI was probably envisioned by its creators as something universal that can eventually replace all other path representations. IUniversalPath seems to fragment the field even more instead of bringing unification.
Comment 5 Chris Recoskie CLA 2007-10-22 15:40:35 EDT
(In reply to comment #4)
> Why doesn't IUniversalPath extend IPath or URI? URI was probably envisioned by
> its creators as something universal that can eventually replace all other path
> representations. IUniversalPath seems to fragment the field even more instead
> of bringing unification.

Implementing IPath would mean we are directly substitutable for any IPath, which we are not.  Clients that use IPaths are typically expecting local paths.

URI is a final class so unfortunately you can't extend it.
Comment 6 Doug Schaefer CLA 2007-10-22 15:55:14 EDT
I'm still confused. Can you give an example of where a IUniversalPath would be used?
Comment 7 Anton Leherbauer CLA 2007-10-23 08:38:35 EDT
(In reply to comment #0)
> As we have been looking at enabling CDT with EFS support, we've noticed a few
> issues.  Some notes are here:  http://wiki.eclipse.org/PTP/designs/remote/EFS. 

On the Wiki it is stated that

> There is no device field in a URI. I.e., it's not legal to have c: in a URI. 

That's not true. The path component may contain ':'.

I also tried the example code on the Wiki and in my case it printed:

file:/c:/a/b/c
Comment 8 Greg Watson CLA 2007-10-24 08:37:12 EDT
(In reply to comment #7)
> (In reply to comment #0)
> > As we have been looking at enabling CDT with EFS support, we've noticed a few
> > issues.  Some notes are here:  http://wiki.eclipse.org/PTP/designs/remote/EFS. 
> 
> On the Wiki it is stated that
> 
> > There is no device field in a URI. I.e., it's not legal to have c: in a URI. 
> 
> That's not true. The path component may contain ':'.
> 
> I also tried the example code on the Wiki and in my case it printed:
> 
> file:/c:/a/b/c
> 

It's legal to have a ':' in the first segment only if the path is absolute, so file:c:/a/b/c is not legal. In any case, I think the point is that we need to preserve the device and path as it exists on the remote machine. The URI loses this distinction, since there is no way to distinguish this from the path /c:/a/b/c (assuming ':' is legal in a directory name).
Comment 9 Greg Watson CLA 2007-10-24 08:50:48 EDT
(In reply to comment #6)
> I'm still confused. Can you give an example of where a IUniversalPath would be
> used?
> 

One example would be where my project is located on a remote Windows machine, and I'm going to be building on this machine. My local machine (running Eclipse) is Unix. Eclipse is computing the build dependencies locally, but issuing remote commands to the build host (or constructing makefiles that are executed remotely). Eclipse must perform path operations locally, but these paths represent locations on the remote machine. If IPath is used, then the path will be assumed to be a Unix path even though it is actually a Windows path (since IPath only represents a local path). I can't convert this path to a string to send as part of the remote build command, since I have no way of reconstructing the original Windows path, and it may also change the semantics of path operations.
Comment 10 Doug Schaefer CLA 2007-10-24 10:00:18 EDT
(In reply to comment #8)
> It's legal to have a ':' in the first segment only if the path is absolute, so
> file:c:/a/b/c is not legal. In any case, I think the point is that we need to
> preserve the device and path as it exists on the remote machine. The URI loses
> this distinction, since there is no way to distinguish this from the path
> /c:/a/b/c (assuming ':' is legal in a directory name).

Internet Explorer uses the following URI for a file on my file system:

   file:///C:/Documents%20and%20Settings/dschaefer/Desktop/CDT_ESE.pdf

This seems to be the standard URI format for Windows paths.
Comment 11 Doug Schaefer CLA 2007-10-24 10:08:50 EDT
(In reply to comment #9)
> One example would be where my project is located on a remote Windows machine,
> and I'm going to be building on this machine. My local machine (running
> Eclipse) is Unix. Eclipse is computing the build dependencies locally, but
> issuing remote commands to the build host (or constructing makefiles that are
> executed remotely). Eclipse must perform path operations locally, but these
> paths represent locations on the remote machine. If IPath is used, then the
> path will be assumed to be a Unix path even though it is actually a Windows
> path (since IPath only represents a local path). I can't convert this path to a
> string to send as part of the remote build command, since I have no way of
> reconstructing the original Windows path, and it may also change the semantics
> of path operations.

What kind of path operations are you thinking about? If all you're doing is converting URI's to file system path locations, why couldn't you simply convert them to a String? If you are doing some other operation, they you really have to ask the FileSystem to do it since you can't make assumpmtions on how the parent/child relationships in paths are actually implemented.
Comment 12 Greg Watson CLA 2007-10-24 10:19:37 EDT
(In reply to comment #10)
> (In reply to comment #8)
> > It's legal to have a ':' in the first segment only if the path is absolute, so
> > file:c:/a/b/c is not legal. In any case, I think the point is that we need to
> > preserve the device and path as it exists on the remote machine. The URI loses
> > this distinction, since there is no way to distinguish this from the path
> > /c:/a/b/c (assuming ':' is legal in a directory name).
> 
> Internet Explorer uses the following URI for a file on my file system:
> 
>    file:///C:/Documents%20and%20Settings/dschaefer/Desktop/CDT_ESE.pdf
> 
> This seems to be the standard URI format for Windows paths.
> 

The path /C:/Documents and Settings/dschaefer/Desktop/CDT_ESE.pdf on my Mac uses the identical URI in Safari.
Comment 13 Chris Recoskie CLA 2007-10-24 10:54:44 EDT
(In reply to comment #8)
> It's legal to have a ':' in the first segment only if the path is absolute, so
> file:c:/a/b/c is not legal.

The javadoc for the constructor says:

"If a path is given then it is appended. Any character not in the unreserved,
punct, escaped, or other categories, and not equal to the slash character ('/')
or the commercial-at character ('@'), is quoted."

punct: The characters in the string ",;:$&+="

The BNF listed in Appendix A of RFC 2396 also says that this is legal.  The
path part can be absolute or an "opaque_part", which can contain colons.  So it
would seem the colon can appear as an un-escaped literal in the path segment.

Legal doesn't mean it behaves "correctly" though.  The complication is when you
try to construct a URI from a String.  This runs a parser according to the
grammar listed in RFC 2396.  If you have a colon, then it falls under the
"opaque_part" production in the grammar, and doesn't get parsed as a path.

Here are some examples:

Example #1:
============

import java.net.URI;
import java.net.URISyntaxException;


public class test {

        /**
         * @param args
         */
        public static void main(String[] args) {
                try {
                        URI uri = new URI("file:c:/a/b/c");

                        System.out.println("Path: " + uri.getPath());
                        // prints:  "Path: null"
                        System.out.flush();
                } catch (URISyntaxException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }



        }

}

Example #2:
===========

import java.net.URI;
import java.net.URISyntaxException;


public class test {

        /**
         * @param args
         */
        public static void main(String[] args) {
                try {
                        URI uri = new URI("file://c:/a/b/c");

                        System.out.println("Path: " + uri.getPath());
                        // prints: "Path: /a/b/c"
                        System.out.flush();
                } catch (URISyntaxException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }



        }

}


The above cases are not what you'd expect.  Microsoft has their own URI
implementation (which has been in the news recently due to its security
vulnerabilities), and this likely accounts for the different behaviour you see
in your browser.

If you create the URI using one of the alternate constructors, you get other
problems...

Example #3:

import java.net.URI;
import java.net.URISyntaxException;


public class test {

        /**
         * @param args
         */
        public static void main(String[] args) {
                try {
                        URI uri = new URI("file", null, "C:/a/b/c", null);

                        System.out.println("URI = " + uri.toString());
                        System.out.println("Path: " + uri.getPath());
                        System.out.flush();
                } catch (URISyntaxException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }



        }

}

When you try to run this you get:

java.net.URISyntaxException: Relative path in absolute URI: file:C:/a/b/c
        at java.net.URI.checkPath(URI.java:1787)
        at java.net.URI.<init>(URI.java:662)
        at java.net.URI.<init>(URI.java:764)
        at test.main(test.java:12)

So, ok, if we force the path to be absolute...

Example #4:

import java.net.URI;
import java.net.URISyntaxException;


public class test {

        /**
         * @param args
         */
        public static void main(String[] args) {
                try {
                        URI uri = new URI("file", null, "/C:/a/b/c", null);

                        System.out.println("URI = " + uri.toString());
                        System.out.println("Path: " + uri.getPath());
                        System.out.flush();
                } catch (URISyntaxException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                }



        }

}

Output:

URI = file:/C:/a/b/c
Path: /C:/a/b/c

So the last case works... but the following is still a problem:

> In any case, I think the point is that we need to
> preserve the device and path as it exists on the remote machine. The URI loses
> this distinction, since there is no way to distinguish this from the path
> /c:/a/b/c (assuming ':' is legal in a directory name).
> 

This is a valid point no matter what the URI implementation
Comment 14 Doug Schaefer CLA 2007-10-24 11:18:12 EDT
(In reply to comment #11)
> What kind of path operations are you thinking about? If all you're doing is
> converting URI's to file system path locations, why couldn't you simply convert
> them to a String? If you are doing some other operation, they you really have
> to ask the FileSystem to do it since you can't make assumpmtions on how the
> parent/child relationships in paths are actually implemented.

My question still stands, though. I assume you are going to ask the file system to generate the path. Why can't it just be a String?
Comment 15 Leo Treggiari CLA 2007-10-24 18:24:30 EDT
Here are some thoughts from someone who:

-  Hasn't been writing code for Eclipse for 1.5 years
-  Thinks he remembers how IPath's and IResource's work
-  Has only a conceptual knowledge of EFS

If I say something that is just plain wrong, let me know.

I'll try to explain what I think we need in CDT.  This is not a detailed proposal and not well thought out, like Chris' is.

The basic idea is that we need a way for a CDT component (editor, builder, indexer, debugger, ...) to be able to ask:

I'm here, tell me how to (most efficiently) get to this file that I need.

Yes, I know that using EFS it might not be an actual physical file, but I'm going to ignore that for now.

What does "here" mean.  I think we need to be able to define "locations" that are of interest to CDT components.  Two obvious "locations" are the Workspace directory and the Project directory.  IResource handles them, but that's not enough.  IPath is a little more general, but when it gives you a relative path, the immediate question is "relative to what?"

Therefore, here are some other locations:
-  The BuilderWorkingDirectory: the directory in which the build is run
-  The DebuggerExecutableDirectory: the directory which the debugger considers to be the root directory of the debuggee executable(s)
-  The DebuggerWorkingDirectory: the directory which the debugger sets as the working directory of the debuggee
-  The IndexerSourceDirectory: the directory which the indexer considers to be the base source directory of the files it is indexing
-  The EditorSourceDirectory: the directory which the editor considers to be the base source directory of the files it is editing
-  UserSourceDirectory: the place considered by the user to be where his sources are located - I guess a project may need to define multiple source directories
-  etc

Projects contain references to many files.  It is best (value judgement...) when these references are relative to a "logical" location.  If they are, then a project can be moved from one physical location to another and the logical locations can be re-mapped.

For support for remote projects, we want these locations to support multiple file systems (Windows, Linux, CygWin, other...) and to identify which one they belong to.

For the project file references, the user would need to be responsible for mapping the UserSourceDirectories to physical locations.  For the CDT component locations, it could be the responsibility of the Configuration (with help from the tool-chain) to set the defaults for the component locations.  

The CDT components can be distributed and the sources can be distributed.

When a CDT component asks "I'm here, where is this file I need?", "here" is a logical location or relative to a logical location.  The IUniversalPath (or whatever it is called) of the source is a logical location or relative to a logical location.  The answer to the question would be some path decription that could be used with some file service provider (maybe EFS) to get the contents of the file.  The routine that answers the question would probably need to consult with file system specific path resolvers - e.g., code that knows how to build paths on Windows, Linux, or other.

This is a very "broad brush" proposal.  Hopefully it is helpful in some way.

Leo
Comment 16 Chris Recoskie CLA 2007-10-26 09:34:51 EDT
 (In reply to comment #14)
> (In reply to comment #11)
> > What kind of path operations are you thinking about? If all you're doing is
> > converting URI's to file system path locations, why couldn't you simply
> convert
> > them to a String? If you are doing some other operation, they you really have
> > to ask the FileSystem to do it since you can't make assumpmtions on how the
> > parent/child relationships in paths are actually implemented.
> 
> My question still stands, though. I assume you are going to ask the file system
> to generate the path. Why can't it just be a String?


Strings are difficult to manipulate.  It's the same reason people use IPath rather than String.  I don't think it's practical to say that everywhere in CDT that we use IPath that we're going to switch to using Strings.  The amount of path manipulation code in Managed Build alone that would have to now parse and manipulate Strings would be boggling.

As far as asking the FileSystem to do it, the IFilesystem has no path manipulation facilities.  You can get an IFileStore from the IFileSystem (even if it doesn't exist) and pull the URI from it, but in order to get the IFileStore object in the first place you need to know the URI or IPath, so really you have a chicken and egg problem.  Alternately, you could navigate the hierarchy to the first existing parent, and then create a handle to each element in the path as you drill down by just using its name, but that's really clunky IMO, and certainly the type of thing you'd want to abstract away from with something like IUniversalPath.

The idea with the IUniversalPath stuff is that you can have a different implementation of it if your URI behaves differently.  I think the default implementation I'm working for hierarchical URIs is going to cover 99% of the use cases though.  Any sensible filesystem is likely to be using hierarchical URIs, because modern filesystems are hierarchical (and even the crazy old non-hierarchical IBM filesystems can be accessed hierarchically now).  If you really have a crazy filesystem that requires that you navigate the IFilesystem hierarchy to figure out what your crazy URI will be, you can create a new implementation of IUniversalPath that does that.
Comment 17 Chris Recoskie CLA 2007-10-26 10:07:37 EDT
 (In reply to comment #15)
> The basic idea is that we need a way for a CDT component (editor, builder,
> indexer, debugger, ...) to be able to ask:
> 
> I'm here, tell me how to (most efficiently) get to this file that I need.
> 

Leo,

Take a look at http://wiki.eclipse.org/images/d/dd/Service_Model_For_Remote_Projects.pdf.

I think that addresses part of what you are talking about.  It's admittedly not 100% fleshed out yet and we're planning some changes, but you can get the gist of it.  Each service provider is theoretically going to have its own set of configuration settings that it cares about.  Basically what you're talking about is each "local" provider (local in this case means the operation runs locally... the files themselves might be local or remote) needs to have a base directory.

I had figured on the service model stuff being something that would live in PTP as opposed to CDT, but that's open to debate.

This probably merits a separate bugzilla.

>The answer to the question would be some path decription that could be used with some file service provider (maybe EFS) to get the >contents of the file.  The routine that answers the question would probably need to consult with file system specific path resolvers - e.g., >code that knows how to build paths on Windows, Linux, or other.

My thought would be that there would be a UniversalPathFactory that you would use when creating paths.  It would know which IUniversalPath implementation is appropriate for the IFilesystem that you're using.
Comment 18 Chris Recoskie CLA 2007-10-26 15:27:12 EDT
Created attachment 81292 [details]
updated patch
Comment 19 Leo Treggiari CLA 2007-10-26 16:11:12 EDT
Continuing on with the concept of "locations", I think it would be useful for a "service provider" to have a "location" similar to what resource/file would have.  An example is, if the code that is generating the makefile for a project build is running where Eclipse is running and the source files and make are running on different machine(s), then the code in CDT can take these two "locations" and decide how the source files need to be referenced in the makefile.  If the "service provider" is generating the makefile, then I'm concerned that a lot of the project data would need to be sent to the "service provider".

I don't how the processing of the other potential service providers (e.g., indexer, debugger) would be split.  Does it make sense in all cases for the code that determines the paths needed to be used by the service provider to get to the resource/files that it needs, be in the Eclipse code and not in the service provider?

Leo

Comment 20 Chris Recoskie CLA 2007-12-13 12:34:49 EST
Created attachment 85201 [details]
Updated patch.

I am attaching an updated patch for people to review.  This patch includes:

* updates to the IUniversalPath interface
* supplied implementation of IUniversalPath (UniversalHierarchicalPath).  A lot of the implementation is borrowed from the platform's Path class.
* JUnit test suite for UniversalHierarchicalPath.  Tests are included for both local paths and URI paths.
* IPathFactory interface to facilitate creation of proper path objects for a given filesystem
* RSEPathFactory
* PathFactoryManager and associated extension point for managing contributed path factories

I intend to check this in next week.  I am currently working in my private workspace on migrating the CModel and clients to using these interfaces.
Comment 21 Markus Schorn CLA 2007-12-17 09:04:15 EST
I second most of the concerns raised in this bugreport and I don't feel that they are yet adressed. The proposed interface looks like an attempt to solve 3 different problems:
(a) Naming the location for a resource.
(b) Bridging between resources and files external to the workspace.
(c) Accessing a file via 2 different file-systems (local and remote).

(a) Certainly both windows- and unix- file paths can be encoded as URIs. So this
    problem is solved by using URIs instead of paths. Together with EFS we have
    the ability to provide different file-systems with different sets of URIs. 
    The nice thing here is that we have direct support from the platform, each 
    eclipse resource is associated to an URI via IResource.getURI() and it 
    provides the abstraction to operate on the files independent of the file-
    system.

(b) CDT has the problem that it works with both eclipse-resources and files
    that are external to the workspace. The ITranslationUnit bridges between 
    those in that it can be based on a resource or simply on an absolute path. 
    ITranslationUnits are to heavy weighted to be used everywhere so in many 
    cases IPaths are used to denote either a 'full path' for a resource, or a
    location for an external file, which is confusing. I can see a need
    for a ResourceOrExternFile-object that may provide a resource and always 
    provides an URI for its location.

(c) Accessing a resource or external file via two different file systems is 
    needed for specific operations, only. In my opinion it is clumsy to 
    require the use of a special object throughout CDT in order to be able to
    translate URIs to ones suitable for a remote machine. 
    In any way you need to provide a mechanism that allows you to map
    an URI suitable for the local machine to an URI suitable for a remote 
    machine (possibly also the reverse). But that's it, if you have a way to 
    compute the mapping, you can use it whereever you need to. This then works
    for any eclipse project.
    Within CDT there could be some components that (optionally) work with such
    a mapping. E.g. when creating a makefile for a remote build one might want
    to convert the local machine URI to the remote machine URI before writing 
    the makefile. The build-output-error parsers would need to make the reverse
    conversion.
    
Comment 22 Chris Recoskie CLA 2007-12-17 14:47:10 EST
 (In reply to comment #21)
> I second most of the concerns raised in this bugreport and I don't feel that
> they are yet adressed. 

Can you elaborate?  I thought I had addressed everything so far.

>The proposed interface looks like an attempt to solve 3
> different problems:
> (a) Naming the location for a resource.
> (b) Bridging between resources and files external to the workspace.
> (c) Accessing a file via 2 different file-systems (local and remote).
> 
> (a) Certainly both windows- and unix- file paths can be encoded as URIs. So this
> problem is solved by using URIs instead of paths. Together with EFS we have
> the ability to provide different file-systems with different sets of URIs.
> The nice thing here is that we have direct support from the platform, each
> eclipse resource is associated to an URI via IResource.getURI() and it
> provides the abstraction to operate on the files independent of the file-
> system.

I agree with all that.  But I still don't see how you can get away without having something that can do path manipulation.

Granted, some parts of CDT don't care about manipulating paths, and just want to know "how can I get to the file".  The core itself doesn't do a whole lot of path manipulation (although there is some...).  For those areas that only care about getting to the file, just using URIs would work.  But for other areas you need something more. 

Since in a lot of cases IPaths are passed around from the core areas to other areas in CDT that then do the manipulations upon them, my thought was the refactoring would be less intrusive if we tried to more or less directly replace IPath itself rather than try to do something more drastic.   But, if people feel strongly about it, maybe the core sticks to URIs, but in the end I don't see how we can do something like makefile generation without having something like IUniversalPath somewhere.

> (b) CDT has the problem that it works with both eclipse-resources and files
> that are external to the workspace. The ITranslationUnit bridges between
> those in that it can be based on a resource or simply on an absolute path.
> ITranslationUnits are to heavy weighted to be used everywhere so in many
> cases IPaths are used to denote either a 'full path' for a resource, or a
> location for an external file, which is confusing. I can see a need
> for a ResourceOrExternFile-object that may provide a resource and always
> provides an URI for its location.
> 

Ok, well we agree on that.  That's part of what I've been trying to accomplish.

> (c) Accessing a resource or external file via two different file systems is
> needed for specific operations, only. In my opinion it is clumsy to
> require the use of a special object throughout CDT in order to be able to
> translate URIs to ones suitable for a remote machine.

I've been arguing that the data kept in IUniversalPath should be minimal.  I think you still need to store the hostname with the path, otherwise you won't be able to look up a mapping for how one machine's paths map to another's, but I don't think IUniversalPath should know anything about remote beyond that.

> In any way you need to provide a mechanism that allows you to map
> an URI suitable for the local machine to an URI suitable for a remote
> machine (possibly also the reverse). But that's it, if you have a way to
> compute the mapping, you can use it whereever you need to. This then works
> for any eclipse project.
> Within CDT there could be some components that (optionally) work with such
> a mapping. E.g. when creating a makefile for a remote build one might want
> to convert the local machine URI to the remote machine URI before writing
> the makefile. The build-output-error parsers would need to make the reverse
> conversion.
> 

I agree with this.  We basically have come to the same conclusion.  See http://wiki.eclipse.org/PTP/designs/remote#Path_Mapping

We intend that IUnversalPath only handles the very lowest level of the conversion process, i.e. give it a root path and it can resolve itself against that path.  It's up to the tooling above it to figure out what mappings if any exist and use them to tell IUniversalPath to resolve itself against a specific path.  Trying to put all the logic in IUniversalPath would make it too bloated, IMO.
Comment 23 Markus Schorn CLA 2007-12-18 04:14:13 EST
(In reply to comment #22)
Chris, thanks for your reply it clarified quite a bit for me. I left out all the stuff where we seem to agree to concentrate on those points I don't yet understand.

> ...
> Can you elaborate?  I thought I had addressed everything so far.
Introducing a proprietary interface for representing file-locations rather than using URIs needs a very good reason. I don't buy your arguments for that, yet.

> I agree with all that.  But I still don't see how you can get away without
> having something that can do path manipulation.
> Granted, some parts of CDT don't care about manipulating paths, and just want
> to know "how can I get to the file".  The core itself doesn't do a whole lot of
> path manipulation (although there is some...).  For those areas that only care
> about getting to the file, just using URIs would work.  But for other areas you
> need something more. 
> Since in a lot of cases IPaths are passed around from the core areas to other
> areas in CDT that then do the manipulations upon them, my thought was the
> refactoring would be less intrusive if we tried to more or less directly
> replace IPath itself rather than try to do something more drastic.   But, if
> people feel strongly about it, maybe the core sticks to URIs, but in the end I
> don't see how we can do something like makefile generation without having
> something like IUniversalPath somewhere.

So if we replace IPath (where it denotes a file-location and not a workspace-
path) with URIs you are concerned that we cannot do the path manipulations we used to do.
URIs provide the following manipulations:
  * normalize: removes unnecessary '..' and '.'.
  * resolve: absolute URI for a relative URI with a base (absolute) URI. 
  * relativize: relative URI for an absolute URI with a base (absolute) URI.
What are the other manipulations you need to do?

> I've been arguing that the data kept in IUniversalPath should be minimal.  I
> think you still need to store the hostname with the path, otherwise you won't
> be able to look up a mapping for how one machine's paths map to another's, but
> I don't think IUniversalPath should know anything about remote beyond that.

This confuses me. The file location for an eclipse resource is denoted by an URI. It can be either a local file or use an arbitraty file-system. In any way the URI contains enough information, such that EFS (on the local machine) can operate upon it. Other than that before mapping the location (URI) to one suitable for a remote machine, it is of no interest from which machine the file can additionally be accessed. (Note that the restriction that this must be a single machine is artificial, especially when you are using NFS) 
The system that performs the mapping needs to know a lot more than just the name of the remote machine, so I don't see why it is necessary to obtain exactly this piece of information from the representation of the location. For example, in case of NFS it'd need to know about the mount points (a), in case of http it needs to know where the web-server stores the data (b).
(a) file:///net/remote-host/export1/myfile.h -> file:///dev/hdd0/myfile.h
    file:///~/ws/myfile.h -> file:///users/chris/ws/myfile.h
(b) http://remote-host/somedir/myfile.h -> file:///web-root/somedir/myfile.h
Comment 24 Chris Recoskie CLA 2007-12-19 15:03:04 EST
 (In reply to comment #23)
> So if we replace IPath (where it denotes a file-location and not a workspace-
> path) with URIs you are concerned that we cannot do the path manipulations we
> used to do.
> URIs provide the following manipulations:
> * normalize: removes unnecessary '..' and '.'.
> * resolve: absolute URI for a relative URI with a base (absolute) URI.
> * relativize: relative URI for an absolute URI with a base (absolute) URI.
> What are the other manipulations you need to do?

Yes, I am aware of these.  These are useful for converting from one machine's path to another (if you know how the paths map between the two).

Off the top of my head, managed build uses these IPath methods a lot:
* append (e.g. append relative paths for object files onto the output path.  This is the API used the most.)
* removeLastSegment (to strip off the file name)
* getFileExtension (to determine what tool to use on a file, for example)
* isPrefixOf (calculate paths relative to one another... although obviously URI.relativize works for this too)
* removeFirstSegments (relative paths again)
* segmentCount (used when iterating over segments, e.g. to determine how many segments to remove when making a relative path)
* removeFileExtension/addFileExtension (creating path to corresponding output file or dependency file)

Other APIs are used here and there to a lesser degree.

> This confuses me. The file location for an eclipse resource is denoted by an
> URI. It can be either a local file or use an arbitraty file-system. In any way
> the URI contains enough information, such that EFS (on the local machine) can
> operate upon it. Other than that before mapping the location (URI) to one
> suitable for a remote machine, it is of no interest from which machine the file
> can additionally be accessed. (Note that the restriction that this must be a
> single machine is artificial, especially when you are using NFS)
> The system that performs the mapping needs to know a lot more than just the name
> of the remote machine, so I don't see why it is necessary to obtain exactly this
> piece of information from the representation of the location. For example, in
> case of NFS it'd need to know about the mount points (a), in case of http it
> needs to know where the web-server stores the data (b).
> (a) file:///net/remote-host/export1/myfile.h -> file:///dev/hdd0/myfile.h
> file:///~/ws/myfile.h -> file:///users/chris/ws/myfile.h
> (b) http://remote-host/somedir/myfile.h -> file:///web-root/somedir/myfile.h

The name of the host was just to ask the path mapper what path to map to.  If this maps to an IUniversalPath or a URI, then all the location and connection info would be embedded in the resulting path (host, authority, query, etc...).  So what I was trying to say was more that I didn't intend for there to be a specific API to get any more information about the connection out other than the name of the machine.
Comment 25 Markus Schorn CLA 2007-12-20 07:45:21 EST
(In reply to comment #24)
> Yes, I am aware of these.  These are useful for converting from one machine's
> path to another (if you know how the paths map between the two).
> Off the top of my head, managed build uses these IPath methods a lot:
> * append (e.g. append relative paths for object files onto the output path. 
> This is the API used the most.)
> ....
You can directly work with relative URIs: new URI(<relativePath>), URI.resolve(..) and URI.relativize(URI) allow you to do that. In addition to that it is important to understand that you have access to the path of the URI at any time: new Path(uri.getPath()) will do that for you. From there you have access to the segments.

> > This confuses me. The file location for an eclipse resource is denoted by an
> > URI. It can be either a local file or use an arbitraty file-system. In any way
> > the URI contains enough information, such that EFS (on the local machine) can
> > operate upon it. Other than that before mapping the location (URI) to one
> > suitable for a remote machine, it is of no interest from which machine the file
> > can additionally be accessed. (Note that the restriction that this must be a
> > single machine is artificial, especially when you are using NFS)
> > The system that performs the mapping needs to know a lot more than just the name
> > of the remote machine, so I don't see why it is necessary to obtain exactly this
> > piece of information from the representation of the location. For example, in
> > case of NFS it'd need to know about the mount points (a), in case of http it
> > needs to know where the web-server stores the data (b).
> > (a) file:///net/remote-host/export1/myfile.h -> file:///dev/hdd0/myfile.h
> > file:///~/ws/myfile.h -> file:///users/chris/ws/myfile.h
> > (b) http://remote-host/somedir/myfile.h -> file:///web-root/somedir/myfile.h
> The name of the host was just to ask the path mapper what path to map to.  If
> this maps to an IUniversalPath or a URI, then all the location and connection
> info would be embedded in the resulting path (host, authority, query, etc...). 
> So what I was trying to say was more that I didn't intend for there to be a
> specific API to get any more information about the connection out other than
> the name of the machine. 
I guess I don't understand why you have the need to store the name of the machine, I try to elaborate this with the example of a remote build that creates a makefile suitable to be used on a remote machine:

The primary URI for a resource is provided by the resources plugin. It basically defines a mechanism on how to access the file from the local machine (through EFS this could be via the local-file system (includes NFS), via a web-server, ftp, etc..). Usually the very same mechanism does not work on the remote machine, so I map the URI to an URI suitable for the remote machine. Let's call it the mapped URI. To sucessfully create a makefile for the remote machine, I expect the mapped URI to be an URI for the local file system of the remote machine (schema: file). Obviously the mapped URI cannot be used on the local machine, but who ever mapped the URI (in my example the remote build) did not map the URI for local use. So it will be clear from the context that the mapped URI is suitable for a specific remote host. As soon as the URIs are mapped, the actual creation of the makefile is independent of the remote host (it creates a makefile for some 'local' file system). This allows for creating
makefiles for different remote machines with different URI mappings.

Comment 26 Markus Schorn CLA 2007-12-20 08:05:53 EST
(In reply to comment #13)
I want to comment on the examples you gave:
Example #1:
The uri is absolute so you need to use a leading slash:

code:
    URI uri = new URI("file:/c:/a/b/c");
    System.out.println("UriPath: " + uri.getPath());
    System.out.println("Path:    " + new Path(uri.getPath()).toString());
prints:
    UriPath: /c:/a/b/c
    Path:    c:/a/b/c

Example #2:
Using two leading slashes means that you want to specify a host. Use 3 of them
and you are ok:

code:
    URI uri = new URI("file:///c:/a/b/c");
    System.out.println("UriPath: " + uri.getPath());
    System.out.println("Path:    " + new Path(uri.getPath()).toString());
prints:
    UriPath: /c:/a/b/c
    Path:    c:/a/b/c

Example #3:
as in Example #1 you need to use a leading slash.

> Example #4:
> So the last case works... but the following is still a problem:
> > In any case, I think the point is that we need to
> > preserve the device and path as it exists on the remote machine. The URI loses
> > this distinction, since there is no way to distinguish this from the path
> > /c:/a/b/c (assuming ':' is legal in a directory name).
> > 
> This is a valid point no matter what the URI implementation

However ':' is not valid in a directory name!
Comment 27 Greg Watson CLA 2007-12-20 09:32:07 EST
(In reply to comment #26)

> > Example #4:
> > So the last case works... but the following is still a problem:
> > > In any case, I think the point is that we need to
> > > preserve the device and path as it exists on the remote machine. The URI loses
> > > this distinction, since there is no way to distinguish this from the path
> > > /c:/a/b/c (assuming ':' is legal in a directory name).
> > > 
> > This is a valid point no matter what the URI implementation
> 
> However ':' is not valid in a directory name!

I'm sorry, but you're wrong. ':' is legal on Linux and Unix (e.g. MacOS X).

> 

Comment 28 Greg Watson CLA 2007-12-20 09:35:59 EST
(In reply to comment #25)
> (In reply to comment #24)
> > Yes, I am aware of these.  These are useful for converting from one machine's
> > path to another (if you know how the paths map between the two).
> > Off the top of my head, managed build uses these IPath methods a lot:
> > * append (e.g. append relative paths for object files onto the output path. 
> > This is the API used the most.)
> > ....
> You can directly work with relative URIs: new URI(<relativePath>),
> URI.resolve(..) and URI.relativize(URI) allow you to do that. In addition to
> that it is important to understand that you have access to the path of the URI
> at any time: new Path(uri.getPath()) will do that for you. From there you have
> access to the segments.

This was already addressed in comment #9. By converting URI's to Strings to local Path objects, you lose any semantic information about the path, and can only manipulate the Path using the semantics of the local operating system. This is not going to be sufficient when the local and remote operating systems are different and have different path semantics.
Comment 29 Markus Schorn CLA 2007-12-20 10:52:01 EST
> > However ':' is not valid in a directory name!
> I'm sorry, but you're wrong. ':' is legal on Linux and Unix (e.g. MacOS X).
> > 
You are right, thanks for clarifying. So although it's more or less an academic use-case (the colon in the directory name) I have to conclude that the computation of the string-representation as it is needed for operating system calls (and thus external programs) depends on the operating system.
At the same time it is still true, that you can perform hierarchy related operations on URIs independent of the capability to create this string-rep.
Comment 30 Markus Schorn CLA 2007-12-20 11:11:06 EST
(In reply to comment #28)
> > > Yes, I am aware of these.  These are useful for converting from one machine's
> > > path to another (if you know how the paths map between the two).
> > > Off the top of my head, managed build uses these IPath methods a lot:
> > > * append (e.g. append relative paths for object files onto the output path. 
> > > This is the API used the most.)
> > > ....
> > You can directly work with relative URIs: new URI(<relativePath>),
> > URI.resolve(..) and URI.relativize(URI) allow you to do that. In addition to
> > that it is important to understand that you have access to the path of the URI
> > at any time: new Path(uri.getPath()) will do that for you. From there you have
> > access to the segments.
> This was already addressed in comment #9. By converting URI's to Strings to
> local Path objects, you lose any semantic information about the path, and can
> only manipulate the Path using the semantics of the local operating system.
> This is not going to be sufficient when the local and remote operating systems
> are different and have different path semantics.

Hmm, if I understood this right so far there is no intention to have the two locations (as seen from local host and as seen from the remote host) represented in a single object (this would not work with multiple remote hosts).
Rather than that a subsystem working for a remote-host (e.g. remote build) will have to deal with a mapping from the URI (or IUniversialPath) suitable for the local host to the URI (or IUniversialPath) suitable for the remote host. The algorithm to create the string-representations suitable for the operating system can best be provided by the same extension that provides the mapping itself. You can even make the extension such that it directly provides a mapping from URIs for the local machine to the string-representation suitable for the remote machine.
In the example of comment #9, I would compute the build-dependencies in terms of URIs for the local-machine and convert those to string-reps for the remote machine to write to the makefile or execute remote commands. 
Comment 31 John Camelon CLA 2007-12-20 14:46:00 EST
Forgive me if I am out of the loop: 

Instead of trying to roll your own path interface, would it not make sense to talk to the platform team about trying to incorporate your needs within IFileSystem?  Since the EFS interfaces take IPath, it is conceivable that the file system could manipulate the paths for you.  

I have some concern that if you try and build your own scaffholding, so will other people, and eventually no one will be able to interoperate on raw EFS. This is a product requirement (currently) with my current IBM product.  
Comment 32 Chris Recoskie CLA 2007-12-20 15:09:23 EST
 (In reply to comment #31)
> Forgive me if I am out of the loop:
> 
> Instead of trying to roll your own path interface, would it not make sense to
> talk to the platform team about trying to incorporate your needs within
> IFileSystem?  Since the EFS interfaces take IPath, it is conceivable that the
> file system could manipulate the paths for you.
> 
> I have some concern that if you try and build your own scaffholding, so will
> other people, and eventually no one will be able to interoperate on raw EFS.
> This is a product requirement (currently) with my current IBM product.

I had asked John Arthorne to take a look a while back.  I'm adding him to the cc: list as a bit of a prod :-)
Comment 33 Doug Schaefer CLA 2007-12-21 11:07:40 EST
(In reply to comment #31)
> Forgive me if I am out of the loop: 
> Instead of trying to roll your own path interface, would it not make sense to
> talk to the platform team about trying to incorporate your needs within
> IFileSystem?  Since the EFS interfaces take IPath, it is conceivable that the
> file system could manipulate the paths for you.  
> I have some concern that if you try and build your own scaffholding, so will
> other people, and eventually no one will be able to interoperate on raw EFS.
> This is a product requirement (currently) with my current IBM product.  

I'll be starting my flexible file system EFS thingy next week. The flexibility it does provide will ensure that the only way to determine the real paths on the file system will be using IFileSystem. The only valid path manipulation has to be through EFS.
Comment 34 John Arthorne CLA 2007-12-21 11:57:24 EST
I can sympathize with the need for a different path representation. IPath is only useful in a very limited domain - representing simple file system paths on the local machine.  URI can represent a path on an arbitrary file system, but the java.net.URI class is not particularly useful - URI objects are memory pigs, are costly to create/manipulate, and have almost no convenience methods for manipulating them.  

IFileStore is the principle abstraction used within the resources plugin. URI is used in EFS/resources for two purposes: as a serializable representation for storing in various data files on disk, and as a neutral interface for passing path objects across API boundaries. By neutral, I mean clients can pass around and store URI objects without needing to know anything about EFS or any other Eclipse API. It's a format suitable for passing between Eclipse-aware APIs, and third party libraries that have no dependency on Eclipse. The general approach in resources and EFS is to take in and hand out URI objects in the API, but internally immediately convert them to IFileStore instances for ease of manipulation.

The main drawback I see with your IUniversalPath is that it seems quite limited in scope. I.e., the only thing it attempts to do above IPath is represent either a Unix or Windows local file system path. EFS attempts to be much more general - able to represent arbitrary backing file systems or file access protocols, such as WebDAV, FTP, files in a CVS repository, etc. Or, it can be used as a wrapper to perform arbitrary transformations on a local file system. Your IUniversalPath will likely work well for the simple remote management/deployment scenarios you are interested in, but doesn't unleash the full potential of being able to interface with arbitrary kinds of file systems.
Comment 35 Chris Recoskie CLA 2008-01-04 13:09:12 EST
 (In reply to comment #31)
> I have some concern that if you try and build your own scaffholding, so will
> other people, and eventually no one will be able to interoperate on raw EFS.
> This is a product requirement (currently) with my current IBM product.

Once we have an IUniversalPath implementation that talks directly to IFileStore (it's planned, I just haven't written it yet), I think that a lot of those concerns become moot.  We will always be able to fall back on that implementation because it will always work for any arbitrary EFS provider.  So, if there's not a path factory for a given filesystem, we can just use that one.  Really UniversalHierarchicalPath is just an optimization, because if you know the path is hierarchical, it's far less expensive to manipulate it in terms of its string segments than it is to traverse and manipulate the IFileStore hierarchy.
Comment 36 Doug Schaefer CLA 2008-01-04 13:27:53 EST
(In reply to comment #35)
>  Really UniversalHierarchicalPath is just an optimization, because if you know
> the path is hierarchical, it's far less expensive to manipulate it in terms of
> its string segments than it is to traverse and manipulate the IFileStore
> hierarchy.

Well, with the Flexible File System, you never know what the path is on the actual file system. You need to take the add/exclude mappings into account.

And from what I've seen, and implemented in the CDT new project wizard, traversing the IFileStore is really easy. IFileStore.getChild(String) gets you a child. IFileStore.getChild(IPath path) gives a child down the hierarchy.
Comment 37 John Arthorne CLA 2008-01-04 13:29:44 EST
After talking with Chris and Greg, it seems the best way to think of this API is as a complement to URI and IFileStore, rather than something that competes with or replaces them.  The goal seems to be just to create a class that has rich path manipulation API on top of what you see on IFileStore and URI.  URI is still a suitable representation at the level of API boundaries, such as when calling external libraries or into the base platform API, and something like IFileStore is needed when you actually want to manipulate the concrete file the path represents (copy, delete, create, etc). But within various parts of the CDT implementation it sounds like there is a need for a richer utility class for manipulating remote paths.
Comment 38 Chris Recoskie CLA 2008-01-10 11:35:17 EST
Well, we need to get moving on things.  Rather than debate things endlessly, I'll stick to using URI as much as possible.  I'll relegate the IUniversalPath stuff to a utility class which can be called when needed.  If certain operations are easy enough to perform directly on IFileStore then that method can be used where appropriate... let the tool fit the job.
Comment 39 Doug Schaefer CLA 2008-01-10 12:47:51 EST
Sounds like a good plan. Thanks Chris!