Community
Participate
Working Groups
EFS is a great addition to Eclipse, it provides the base technology for remote projects / remote build support like outlined e.g. here: http://wiki.eclipse.org/index.php/PTP/designs/remote/cdt as well as in some commercial additions to Eclipse. One problem with the current solution is, that remote C/C++ projects on UNIX often use symbolic links extensively, and these are not understood by current eclipse. What's making things bad, is that 3rd parties don't even have a chance to develop their own plug-ins which would decorate IResources with information about symbolic links since they don't get this information out of EFS even if they do their own implementation of an EFS provider. I thus propose adding EFS.ATTRIBUTE_SYMLINK = 1 << 5; such that commercial implementations of an EFS provider can collect information whether a given IFileInfo object is a symbolic link or not, and pass this information up to the higher Eclipse levels. This single addition to the API can be considered as reserving the value 1 << 5 for the exclusive use of eclipse, meaning symbolic links even if there is no actual implementation behind it in the first place. Higher levels can be vendor-supplied decorators for instance, or at a later time the IResource APIs could be extended to deal with symbolic links. API Doc could be specified that ATTRIBUTE_SYMLINK = "true" always means the IFileInfo is a symlink indeed, whereas a value of "false" could mean that it is either not a symlink or no information could be retrieved. Vendor-supplied EFS providers for remote projects could choose to fill in the attribute appropriately. Fixing this issue would also be the first step towards fixing bug 44107. In a second step, API should be added to EFS that allows actually reading the symbolic link target (readlink() operation), but I believe there is not such a pressing need for this as for the information whether a resource actually is a symbolic link or not. This further extension would actually require adding a string attribute to the IFileInfo API.
Created attachment 56827 [details] Patch to add symbolic link support to EFS API, with Linux impl Attached patch adds the proposed API for an EFS.ATTRIBUTE_SYMLINK attribute, as well as support for extended String attributes on IFileInfo, and also contains an implementation for reading symbolic links on Linux. Note that due to using stat() instead of lstat(), this patch brings modified functionality ONLY in case of broken symbolic links (that do not point to any valid link target). Changing this to use lstat() instead would change existing functionality (though I'm not aware of any written-down API contract with respect to this), so this was not considered.
Created attachment 56828 [details] Improved patch with more elegant way adding linkTarget info Attached improved patch adds a slight modification that makes creating IFileInfo objects with extended attributes more elegant.
This is a reasonable request. Assigning to myself to review the patch in more detail.
Created attachment 57518 [details] Small cosmetic and doc fixes over previous patch Martin, overall this looks good. My main question is, why not fill in the symlink target value directly within the fetchFileInfo native method, rather than adding a new internalReadlink method. That would avoid the extra JNI call to fetch the file info. It would also save us from changing the native library signature, requiring recompiling on every platform. The only thing I can think of is that you're concerned about adding the extra field to FileInfo that won't be needed in most cases. However, we don't hang onto these FileInfo objects in memory for long, so I don't see this as a big concern.
I should also say sorry for the delay in looking at this. The PMC has asked that all new 3.3 API be added by M5 if possible, so I would like to get this in soon.
Having had time to further think about this, I'm still pretty convinced that defining a constant to identify symbolic links (EFS.ATTRIBUTE_SYMLINK) is the right thing to do. Plus, there should be another constant for identifying special files like sockets, named pipes, device nodes etc. to ensure that Eclipse Workspaces / Editors don't try to do normal file operations with these. EFS.ATTRIBUTE_SPECIAL seems to be the right thing to do for these. With respect to actually reading the symbolic link target (readlink operation) and storing the contents in an extended attribute, I'm not so sure whether the proposed solution in my patch is the right thing to do. Although it is memory efficient and seems right for the concrete scenario of symbolic links, I guess I'd also like to see API where contributors can fill in extended attributes lazily (the current implementation requires extended attributes to be filled in at the time the IFileInfo node is created).
(In reply to comment #4) > Martin, overall this looks good. My main question is, why not fill in the > symlink target value directly within the fetchFileInfo native method, rather > than adding a new internalReadlink method. The reason was indeed to try and be memory efficient, i.e. avoiding an extra field in FileInfo that would not be used most of the time. By deriving from the "standard" FileInfo implementation we can have specific FileInfo objects which hold the extra attribute only if needed. I was not aware that FileInfo objects are not kept in memory long. So if memory is not an issue, the extra step can be avoided indeed, and the normal FileInfo object can even hold a String[] or List<String> attribute for extended attributes which would be null in most cases but filled in for extended attributes. Note, though, that filling such an attribute properly from JNI may not be very easy. Or would you think about a single extended String attribute, being easier to fill in but not as flexible? My previous concern of supporting lazy extended attributes (for attributes less fequently needed, and expensive to retrieve - e.g. ACL's) would also not be an issue in this case, sine we don't need any variants of FileInfo and in case an implementor wants an attribute to be retrieved lazily he can e.g. make the attribute existent but with a null value such that an extra LazyAttributeManager can retrieve it when needed.
I'd prefer to keep focused on the sym link support and not get into supporting an open ended set of extended attributes right now. However, it's good to design the API now in a way that would allow adding that in the future. I suggest adding a setStringAttribute method to IFileInfo/FileInfo. It's easy to call this method from JNI to set the value (as we do for the file name). Right now the implementation can just have an extra String slot for the link target, and ignore any unknown attribute passed to setStringAttribute. The specific storage is an implementation detail, so we can easily move to a String[] in the future without impacting anyone. We should also specify for now that we don't support writing the link target attribute via putInfo... the best place I can think to specify this is on the attribute constant's javadoc.
I can see your points (thought about the javadoc addition regarding putInfo as well), and I'm fine with your suggestions. Couple of minor questions: - So IFileInfo will get the new capability to store a single String attribute. The meaning of this String attribute will depend on the EFS.ATTRIBUTE_* status, and will initially be defined only for ATTRIBUTE_SYMLINK. I'm basically fine with this, but should we also consider other options to this, e.g. setLinkTarget(String) / String getLinkTarget() //more narrow sense setExtendedAttribute(Object) / Object getExtendedAttribute() //wider sense - What about the version of the native lib (localfile_1_0_0.dll / liblocalfile_1_0_0.so). Since it'll get new meaning and require new Java API on UNIX, version number will need to move up at least for UNIX while windows will stay exactly the same... should the native lib loader try loading localfile_1_1_0 with a fallback to localfile_1_0_0 for Windows? Any other suggestions? - Regarding the implementation and meaning of ATTRIBUTE_SYMLINK. Due to the meaning of the stat() call in the current implementation, which transitively follows symbolic links, and in order to keep existing functionality intact as much as possible, I'd suggest the following semantics: * symlink pointing to [...] pointing to valid file in the end --> ATTRIBUTE_SYMLINK | ATTRIBUTE_EXISTS * symlink pointing to [...] pointing to valid directory in the end --> ATTRIBUTE_SYMLINK | ATTRIBUTE_DIRECTORY | ATTRIBUTE_EXISTS * symlink pointing to [...] pointing to nowhere in the end (broken symlink) --> ATTRIBUTE_SYMLINK So I suggest that an existing but broken symlink (which points to nowhere) should NOT set ATTRIBUTE_EXISTS in order to be able to tell it apart from a symlink that points to a valid file. This may be a change in functionality compared to what's there right now (I think that stat() returns "exists" even for broken symbolic links; I'm not sure what java.io.File.exists() does in this case). In each of the three cases, the StringAttribute should carry the symbolic link target of the first link in the chain (nontransitive, such that clients can follow the chain if they want). Being able to tell a symlink that points to a valid file apart from a normal valid file is important in the case where the file should be deleted (should the real file be deleted or just the symbolic link). If we want this functionality, I think the implementation needs to change compared to what I submitted with the original patch. Naturally, test cases should cover the three scenarios outlined above. If you're not ok with the change of functionality for broken symbolic links, we'd need a new ATTRIBUTE_FILE to indicate valid files, or something similar.
Created attachment 57703 [details] New patch setting link target through FileStore.setStringAttribute(String) Attached patch sets the link target by calling FileStore.setStringAttribute(String) as proposed, and thus works without changing the JNI method signature. It also contains some doc improvements in EFS#ATTR_SYMLINK, and correctly sets the FileSystem.attributes(). It remains to be clarified what version number the native library should get, and what other OS's besides Linux should get symlink support (Vista does provide symlinks now too!) I'm still pondering whether introducing IFileStore.getSymbolicLinkTarget() and IFileStore.isSymbolicLink() might be a better idea than IFileStore.getStringAttribute(). I'm going to write some test cases for the new functionality now.
Created attachment 57737 [details] Final patch tested on Linux Attached is a final patch tested on Linux through the unit tests attached to bug 172062. It turns out that setting the link target through FileInfo.setStringAttribute(String) is problematic since JNI native code can only create Strings as UTF-8. This is fine on Linux (where UTF-8 is the default encoding) but may be problematic on other Platforms. The problem is, that JNI native code doesn't have access to most Java classes for converting charsets since the OSGi non-standard class loaders are not accessible from JNI. One way around this might be to add static public String FileInfo.fromPlatformBytes(byte[] chars) { return new String(chars); } since the FileInfo object and class are known to be available in JNI. Another option is to allow void FileInfo.setStringAttribute(byte[] platformBytes) in addition to the variant taking a String. The other thing is that I still feel that String IFileInfo.getSymbolicLinkTarget() void FileInfo.setSymbolicLinkTarget() might be more appropriate and easier to document than getStringAttribute(). And, the version number of the plugin as well as the native library needs to be clarified (I suppose it should be liblocalfile_1_1_0.so and similar on all platforms).
One more word regarding the impact of this change, compared to Eclipse-3.2: -) As long as referenced files are no symbolic links, there is no impact at all -) For each symbolic link queried by the native library on Linux, *) Additional calls to lstat() and readlink() are made *) An additional String is added to IFileInfo, requiring charset conversion These extra calls might make performance and memory usage suffer a little bit in large workspaces where lots (most) files are not physically present but linked by symbolic links. Such workspaces are not uncommon in some CM scenarios (e.g. "shadowing" a read-only workspace with symbolic links, such that few files can be locally changed by replacing them with actual files). I do not expect the performance penalty very large though, since Eclipse-3.2 also did a stat() call which needs to transitively follow symbolic links anyways, though inside the OS. Therefore, provided that the OS does some useful kind of caching, the extra lstat() and readlink() operations should not have much extra impact. The extra memory needed should also be little since IFileInfo objects are not held on for long. Apart from the (small) performance penalty, the unit tests from bug 172062 show that the change is 100% API compatible. The benefits of the change are -) 3rd parties now have an API that allows them support symbolic links on remote file systems -) On localfile, upper layers in Eclipse can now know whether a given file (resource) is a symbolic link or not, and what the link target is. This can be used to fix some known bugs (e.g. associated with bug 44107).
Oh dear, I wrote a long comment this morning but somehow I must have closed my browser before submitting it... In short: - Overall the patch looks good, except I think it should be getStringAttribute(int), to support a generic set of string-based attributes. I think you had this earlier but I think you perhaps misinterpreted my earlier comment. - Since the signature of the native library is now completely unchanged, I'd like to keep the version number the same. No sense in recompiling on all platforms just to up the version number. - I wrote up some info on how to integrate with test suites, but it looks like you figured it out.
Created attachment 57750 [details] Changed to getStringAttribute(int), and setStringAttribute(int, String)
Fix released. Thanks for the excellent contribution!
Created attachment 57792 [details] Fix for platform default encoding, and minor javadoc fixes Thanks for accepting this. Attached patch further improves it, by fixing the issue of always using UTF-8 for setting the link target from native libs. It now uses the Platform default encoding (was not an OSGi class loader issue after all). It also fixes some minor javadoc issues and corrects the result of IFileSystem.attributes() for HP-UX and QNX when native libs are provided. Is there a chance for getting a native lib fragment for solaris-sparc officially checked in and registered with the builders, when we contribute it?
The more I think about this, the more I'm unhappy with getStringAttribute(int) and its (set) counterpart. Since the window for API changes is closing and I really find this important, here's some reasoning and a suggestion: * String attributes are different than boolean attributes, because there is no point in doing a bit-or with them. A String attribute simply exists or not (using the common semantics that returning a null value means non-existence), and multiple String attributes always need to be checked individually. * Because there is no point in bit-or of the "key" constant, the "key" constant doesn't need to be an integer. It can just as easily be a String or any Object. Using a String has the big advantage that multiple extenders of the API don't stomp on each other when they use the common practice of reverse dns notation e.g. LINK_TARGET = "org.eclipse.core.resources.linkTarget" With integer constants, no extender ever knows whether it is safe to use any particular constant value. * Using a String instead of an int as the key also doesn't cost any performance penalty or memory overhead. It's just a (32-bit) pointer to a final object instead of a (32-bit) integer. Given that always the same String object will be used as the constant, even equals() isn't expensive since const==other. * Because String attributes are different than the boolean attributes, I'd also suggest reflecting this in the name of the key constant. In fact, the String attributes are more like Properties so I'd suggest using PROPERTY_LINK_TARGET * With respect to the individual String attributes (or Properties), I'm undecided whether just String properties should be supported (providing type safety) or any Object should be allowed (providing slightly more flexibility at the cost of having to cast). I think I slightly prefer Strings. That all being said, I think that the following change doesn't cost any Performance or maintenance penalty (since it is conceptually the same we have now, with the only difference of using String keys instead of int), but makes the API much more usable and fit for the future: class EFS { public static final String PROPERTY_LINK_TARGET = "org.eclipse.core.resources.filesystem.linkTarget"; //$NON-NLS-1$ } interface IFileInfo { public abstract String getProperty(String key); public abstract String[] propertyKeys(); } class FileInfo { private Map properties = null; private static final String[] NO_PROPERTIES = new String[0]; public synchronized String getProperty(String key) { if (this.properties==null) return null; return (String)properties.get(key); } public synchronized String setProperty(String key, String value) { //few entries expected -> use TreeMap or new ObjectMap(2) if (this.properties==null) this.properties=new TreeMap(); return (String)this.properties.put(key, value); } public synchronized String[] propertyKeys() { if (this.properties==null) return NO_PROPERTIES; String[] result = new String(this.properties.size()); Iterator it = this.properties.keySet().iterator(); for(int i=0, i<result.length; i++) { result[i] = (String)it.next(); } return result; } } One can dream up really cool things with this API, e.g. Properties for the user and group name, full UNIX permission string, file type, ACLs, locking info on databases, and other administrative stuff that might be supported on arbitrary file systems. Such properties could be supported by a generic UI displaying them. The IFileStore.copy() and move() operations could support copying between file systems of different type while maintaining those attributes and Properties that are the least common denominator (intersections) of the Properties defined by the two file systems. Tons of options at no cost or risk I can see - isn't that tempting? If you want to accept this, I volunteer contributing a patch for updating the Javadoc and the JNI code as well as testing on Linux.
I have to think a bit more, but my initial reaction is I don't like this direction. The fundamental difference in what you suggest is that the space of property keys is open-ended, and keys can be defined by particular file system implementations rather than by EFS. The problem I see with this is that the API loses its value as a generic file system API. I.e., if file system X defines a new property key, then clients who want to use that key are now tied to one particular file system. The whole power I see in EFS is that the particular file system is mostly interchangeable from the client's point of view - there is no direct coupling between an EFS client and any particular file system. Custom properties for particular file systems goes against this. One could argue that "sym link", or even attributes like "hidden" are not widely implementable, and so already tie clients to particular file sysems. However, these concepts are at least given a definite semantic meaning at the EFS API level, so it's possible to slide in a new file system that supports the same properties without the client caring. With property keys defined within a particular file system, this is completely impossible. Say a particular file system defines a "com.myOrg.myFilesystem.accessControlList" property. Another file system comes along that also has a notion of ACL, but it can't use the key defined by the other file system because it would couple them to that file system. If they defined their own ACL property, then clients couldn't make use of the two file systems interchaneably. My point is that the commonality must be defined at the EFS level for it to be of real value to clients, in which case integer keys make sense because EFS controls the space of allowable properties. My other concern is that by bundling an open-ended set of extra information into IFileInfo, it will over time add cost to fetchFileInfo/putFileInfo. Currently, IFileInfo mostly contains information that can be obtained from a single native OS call (such as a Unix stat call). If fetchFileInfo begins making other calls, it punishes all clients that are not interested in fetching the extra attributes. In thinking about this, I went back to your sym-link implementation and realized you're adding an extra lstat call to each file system access... I think this will hurt performance... isn't it possible to only do the lstat for files that we know are sym links (this info is contained in the basic stat call)? I'd rather do an extra stat for the sym link case instead of an extra stat for every single file.
I fully agree that the power of EFS is in its abstraction from common concepts. For that reason, I really like the way that ATTRIBUTE_READ_ONLY is implemented for instance. But I don't think that my proposal breaks this direction. 1. Interoperability, abstraction and well-defined keys Custom Property keys don't break your ability to declare well-defined Property keys right in EFS. Just like with the ints, you have an "official" space of Property keys (declared in EFS, with prefix org.eclipse...) that should have an abstract well-defined meaning and that you have full control over. I'm just against forced limitation to this. If clients want to do more (for themselves, well knowing that their stuff may not be interoperable) they should be able to do it. Plus, clients A,B,C can agree on some common keys for themselves. They have the CHANCE of (limited) interoperability. Which is certainly better than nothing, as long as no "official" keys are declared. I could imagine that once some clients come up with useful "private" keys for themselves, these keys could go into the "official" space after some time. 2. Heavy-weight IFileInfo No client is forced to fill any properties into IFileInfo. Every client should be encouraged to fill in "what is right for him" or "what can be done fast enough". That may mean nothing at all on some clients, not even the "well-defined" keys. Additional API for lazy retrieval of properties may be dreamed up later. But on an Sftp connection, for instance, there are properties like UNIX access rights, user ID, group ID that you just get for free. Clients should have the chance to advertise what they have. 3. The extra lstat call Unfortunately, doing stat() does _not_ tell me whether the original node was a symlink or not. It's fully transparent, "looking through" the link. Therefore, I have to do lstat(). But note that in case we're dealing with a normal file, lstat() alone is now sufficient and the extra stat() call is not needed. Therefore, there is no performance penalty at all in case we're not dealing with a symlink.
You're right about the stat/lstat issue.. I misread the #else statement. Let me think about it a bit more...
Regarding comment 16, compiled native contributions for other platforms would be most welcome. Just attach the compiled library to a bug report.
After thinking on it, I still don't like the idea of an API with an open-ended set of attributes. I'm going to stay with the current API.
Let me give some concrete examples. 1. Target Management / RSE APIs My current considerations come from bug 170916. We currently have our own APIs where we allow extenders to contribute plug-ins for remote file systems. Ours are more open-ended than EFS, since the goal is to make every aspect of a remote file system visible (users plug in the tools for this). In addition to that, we wrap our API in an EFS provider further up the chain. This requires us to have TWO APIs for approximately the same thing - ours and EFS. This has been criticised by most people who looked at it (e.g. http://dev.eclipse.org/mhonarc/lists/dsdp-tm-dev/msg00255.html). If EFS would provide the generic kinds of properties I proposed, we could throw away our (old, grown-over-time, not-as-well-designed-as-EFS) APIs and live with EFS perfectly. 2. Other Remote File System Integrations When I remember right, one goal of EFS was to support remote workspaces. With a backing store as diverse as remote UNIX boxes, databases, mainframes or whatever. One aspect of remote file systems is the latency. Every transfer costs time. So while one can argue that EFS just supports the most common, everywhere-available kinds of attributes, and ask extenders to provide info on alternate attributes through alternate channels, this may be a performance issue. Since alternate channels mean additional round trip to the remote side. With my proposed extension, file system providers can give exactly the information they get in a single round-trip to the server and pass it up the chain for further processing. I guess my main concern is that I can't see why an "int" key for a property set should be better than a String key into the same property set. I, for one, can only see advantages of the second approach while the first one locks in. I've mentioned before that I like the concept of EFS being an "abstract" file system with common properties known and understandable by everyone. I just don't like locking in the API. So, what would you think about a compromise: make the API use String keys, but write Javadoc that only the String keys declared in EFS are recommended in order to support interoperability, and other keys are discouraged?
Ah, I didn't realize you had in mind concrete usage in RSE - I was thinking of this as a philosophical distinction between String and int keys. Can you create a new bug report and provide a patch?
Going to create the new bug + patch tomorrow (ran out of time today). I did investigate the RSE APIs, and found we'll not be able to get rid of them completely because of the following: - RSE API allows filters for directory contents retrieval in the API: IFileService.getFiles(progress, parent, filter). These filters can be evaluated at the remote side for performance, if there is an agent remotely. - RSE API has some hi-level getters like IFileService.isCaseSensitive(), IFileService.getUserHome(), IFileService.getRoots() I don't see a point in implementing such functionality in EFS right now [does anyone else?]. What I'm investigating, though, is a very light-weight wrapper such that an existing EFS provider can be used as RSE IFileService easily. That way, providers will implement the remote access only once, and be able to contribute it to both RSE or Platform. For this, the generic properties as described earlier will certainly make sense. In terms of passing arbitrary properties from a provider up the food chain, I noticed that something like this is always possible: IFileStore store = fs.getStore(myURI); if (store instanceof MyFileStore) { MyProperty prop = ((MyFileStore)store).getMyProperty(); } but such code only works if provider and consumer come from the same implementor so it's not interoperable and I prefer the solution I proposed earlier: IFileStore store = fs.getStore(myURI); IFileInfo info = store.fetchInfo(); Object prop = info.getProperty("com.acme.property"); if (prop!=null) { //... }
I think it's now too late for further changes to the API.
(In reply to comment #25) I finally managed to create new bug 176051 to follow up asking for generic file properties. I also attached a patch there. For our use (Target Management / RSE), I think it would be a big plus if we were able to have an IFileService implementation that delegates to EFS for most of its work. It would help bringing technologies together under a single consistent interface (EFS).