Community
Participate
Working Groups
I found out if I copy a folder into an archive file, expand it, it is empty. I have the same behaviour in both lastest dev driver and RSE-SDK-I20070726-0600.zip driver. It happens for both DStore Windows and DStore Linux. No problem for Local Windows. But a very strange thing is that my JUnit testcase for running the same scenario worked. ************************************************ TM 2.0.1 Testing installation : eclipse-platform-3.3RC3 RSE install : RSE-SDK-I20070726-0600.zip java.runtime : Sun 1.5.0_06-b05 os.name: : Windows XP 5.1, Service Pack 2 ------------------------------------------------
Can you check 1. If the folder actually is in the archive (and just not shown) or it has never been copied into the archive (I would assume it is just not shown since your unit test is ok) 2. If the behavior is a regression or already failed with 2.0.0.1 (I would assume it is a regression since it seems an obvious thing) 3. If it was OK in 2.0.0.1, also check whether it worked in RSE I20070719-1300 If yes, then this is likely a side-effect similar to bug 197784 which would have been caused by the fix for bug 197089 or bug 197484.
It would be ideal if you could come up with a unit test that shows the failure, and runs OK after the issue has been fixed. Such a unit test could be a key door opener for lots of other unit tests related to the SystemView and filtering -- areas where we've often had issues and especially regressions. Starting to get unit test coverage for those kind of things would be a big, major step forward.
I did some investigation, and the following is what I found: . It is not a regression. I think this problem was introduced when UniversalFileSystemMiner#handleCopyBatch() was introduced (which was already existed on 1.1 version). . The reason that my JUnit testcase for similar scenario worked is that I used API IFileServiceSubSystem#copy(), instead of IFileServiceSubSystem#copyBatch(). But the copy operation in SystemView invokes copyBatch() operation even only copy one folder/file. It seems to me copy()is only used if you trying to overwrite a file/folder. So I will add testcases using copyBatch() as well. The problem for UniversalFileSystemMiner#handleCopyBatch() is the following: If the target folder is an archive file or virtual folder, it will call ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names) to add the list of source files/folders passed from the client. Unlike ISystemArchiveHandler#add(File file, String virtualPath, String name), which gets all the list of the input file's decendent if it is a directory, ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names) assumes this "files" parameters contains all the files/folders we need to add to the archive file. So in my scenario, it was only the folder I want to copy, but not all its decendents. The reason it worked for copy() (in the JUnit testcase) was that UniversalFileSystemMiner#handleCopy() was called, and it in turn called ISystemArchiveHandler#add(File file, String virtualPath, String name). And this add() method make sure to include all the decendents of the "file" if "file" is a directory. This scenario works for Local since LocalFileService#copyBatch() simply loops through all the input source, and call LocalFileService#copy(). From what I've seen, the simpliest fix for this problem is to use the same approach as LocalFileService -- let the UniversalFileSystemMiner#handleCopyBatch() loop through the input sources, and call UniversalFileSystemMiner#handleCopy(). But adding a file/folder into an archive is very expensive. It needs to extract the contents of an archive, and then reconstruct it. We want to reduce those operations to minimum. So the proposed way to fix it is the following: For each source files passed by client, checking if it is a directory. If yes, use the similar code from Archive handler to get a complete list of its decendents (we need to copy the code SystemZipHandler#listAllFiles() to a private method in UniversalFileSystemMiner since this method is not part of the ISystemArchiveHandler interface). We gather the all those decendent file/folders plus the source files into an array, and then call ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names).
Just one more thing I want to add here to make sure I did not forget to also look into it: If I already has a folder with the same name in the archive file (the folder is empty), and then I copy this folder, and paste it into the archive file. I got the Duplication Name Collision dialog. Overwrite is chosen, and then I click OK to overwrite it. Then I expanded newly copied folder. It is still empty. This happens in Windows Local as well. It seems something wrong for the algorithm for processing of replacing file/folder in an archive file.
I just tried the move, and it actually worked. The reason is that move actually used "copy" and then "delete", and "copy" works. There is no "moveBatch".
Created attachment 74891 [details] fix for bug 198046 This fix makes the following changes in the copyBatch() method of UniversalFileSystemMiner: For each source files passed by client, checking if it is a directory. If yes, use the similar code from Archive handler to get a complete list of its decendents. We gather the all those decendent file/folders plus the source files into an array, and then call ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names). This fix is made in the dstore server side.
Couple of questions / comments: 1. Is a similar fix also needed for Local? 2. listAllFiles() method: 2.1. I first thought that by having a HashSet parameter, you expose some internal details of the implementation to the caller. That's not beautiful. I later noticed that you use the HashSet in order to do the symlink checks. The Javadoc should document the parameters better, even if it's a private method. Especially when Collections are part of an API for a method, you MUST document in the Javadoc what type the contents of the collection are assumed e.g. @param found HashSet<File> being populated and returned 2.2. Your check against symbolic links can not work like this. You need to do java.io.File.getCanonicalPath() and stuff that one into a HashSet, without it you'd never detect the symbolic link target. And you need to do that check only for directories, but not for files -- getCanonicalPath() is expensive so it should be avoided if possible. 2.3. Because of these two issues I think you'd be better of doing a nested static class in order to collect the files -- such a class could even be overridden to only get particular patterns of files if one desired at some point: Collection allFileNames = new FileCollector().getFiles(parentDir); private static class FileCollector { private Set fCanonicalDirsVisited = new HashSet(); private List fResults = new ArrayList(); public Collection getFiles(String parentDir) { File parentFile = new File(parentDir); if (parentFile.canRead()) { if (parentFile.isDirectory()) { internalGetFiles(parentFile); } else { results.add(parentDir); } } return fResults; } protected void internalGetFiles(File parentDir) { String cp = parentDir.getCanonicalPath(); if (!fCanonicalDirsVisited.contains(cp)) { ... } } } 2.4. I was also wondering what happened to empty directories in your "Add all to zip" recursion code. Perhaps the recursive adding of directories should be implemented in the Archive Handler rather than the client? - Archive types might differ whether they can add empty directories or not. 3. At the place where you construct the list of names to be added to the archive, you convert back and forth between Files and Strings ... is that really necessary? - Try to think about what you need and don't convert so much. 4. When you do thisName.replace ('\\','/') be aware that on UNIX, the backslash is a valid character in a file name. There might be other parts of our code which can not handle that special case, but wherever we touch and change the code anyways, we should try and be aware of it. Not honoring such special cases could be where viruses, trojans and other malicious software creeps into a remote system!
Reading comment #3 again, I think the right solution is that copyBatch() iterates over the list of items to be added, and calls ISystemArchiveHandler#add(File file, String virtualPath, String name) for each of the items in turn to have them added recursively. Rationale is that depending on the kind or archive handler, it may be able to add symbolic links as symbolic links rather than following them, properly add special devices, empty directories and other peculiarities of the file system. We should really leave all that stuff to the archive handler rather than doing our own job trying to recurse the sources. Or would that be an issue when the sources and the archive to add the stuff into are on different hosts? - Perhaps that case would need special consideration, having to do the recursion inside the copyBatch method after all? Symbolic links and hard links are another thing to consider: should they be followed unless forming a cycle, or should they never be followed? See Bug 44107 comment 10 for a list of possible options.
This is regarding to Comment #8. I agreed that the more proper way to resolve this problem is that copyBatch() iterates over the list of items to be added, and calls ISystemArchiveHandler#add(File file, String virtualPath, String name) for each of the items in turn to have them added recursively. This approach is similar to copyBatch calling copy() to handle each of the file/folder in the list passed by client (of course, it will elminate some checkings). LocalFileService does not have the issue because it uses the similar approach (copyBatch call to copy() for each files in the list). The reason I did not use this approach before is for performance consideration, since the implemation of add() method (e.g., SystemZipHanlder) which takes a single file object will reconstruct the entire archive file, and it is very expensive when dealing with large archive. I think the root of the problem is that the add(File[] ...) methods could not distinguishs between a list of files which already been expanded by add(File ...) method or just the list passed by client. When we could make the API change, we should add an add(File[] ...) API which take a list of File object, expand all the File object in the list, and then call the right method to really construct the archive file just once. And copy() could be changed to just calling this new API by passing a list containing only one File object, instead of call add(File ...) (the one expect single File object). For now, I will change the code according to comment #8, but will open a bug to keep track the API change. Please comment on this proposal. Thanks.
Proposal sounds good. I'd hope in reality the list of items in copyBatch should not be too large so the performance issue should not be that bad. In fact you could do a hybrid approach, where you sort the items passed into copyBatch such that the "files" are processed by calling add(File[]) but the folders are iterated over by calling add(File). Can you ensure that the expansion code in add() for archives does not die on cyclic symbolic links, as your attached patch would have died? Also, please check that the right thing is done when adding a folder from host A into an archive on host B.
The scenario I could think of for large number of folders is that user could use this copyBatch function to actually construct an archive file. For example, create a myzip.zip, and then select all the folders interested, and copy/paste into myzip.zip. The add() inside the zip handler will have the same problem as my added code, since I basically copy this method from zip handler to UniversalFileSystemMiner. The reason is that it is not part of interface for ISystemArchiveHandler, and I could not call it from UniversalFileSystemMiner without casting the archive handler. I will take a look at this method inside each archive handler implemention, and make the necessary changes according to your comment.
Thanks. I'd suggest filing a separate bug for that, e.g. "Adding a folder to an archive hangs on cyclic symbolic link" so it's documented (also for 2.0.1 release notes). If you can, also add a unit test for that case.
Created attachment 75376 [details] Updated fix according to Martin's comment. I updated the patch according to Martin's comment -- call ISystmArchiveHandler#add(File ...) for directory one by one, and call ISystmArchiveHandler#add(File[]...) for all the files in batch. I need to use an ArrayList to contain all the files since I don't know the # of files ahead of time, and then I need to convert it back to File[] since it is what the ISystmArchiveHandler#add(File[]...) expects. This fix in on DStore server side. JUnit testing for copyBatch() has been run on this fix in both Windows and Linux DStore servers.
I've committed the fix to CVS since I need to work on the updated file on a different bug.
Created attachment 75426 [details] Patch to improve copyBatch() code Attached patch improves the code as follows: 1.) File[] srcFiles, String[] srcNames are arrays are unnecessary, since only the current srcFile / srcName is ever looked at. By getting rid of the arrays, the JDT Compiler also shows another flaw in the code -- when srcType is invalid, it was unclear what the code would have done. Added an else... branch to bail out in that case. 2.) if(result == false) should be written more briefly: if(!result) 3.) Converting an ArrayList into an Array should be done as follows, which is well-known and often used boilerplate code -- thus faster to read and faster in execution than iterating the loop ourselves: Foo[] fooArray = (Foo[])fooList.toArray(new Foo[fooList.size()]); Please review -- I'm committing the patch.
I chosed the "+" sign beside the review request. Thanks Martin, for showing those code improvements.
There is a typo in the code which was not caught by the code review. It was caught by re-running the JUnit testcases: Instead of: String[] resultNames = (String[])nonDirectoryArrayList.toArray(new String[nonDirectoryNamesArrayList.size()]); Should be: String[] resultNames = (String[])nonDirectoryNamesArrayList.toArray(new String[nonDirectoryNamesArrayList.size()]); I will update it in the UniversalFileSystemMiner#handleCopyBatch().
Verified that this problem is fixed in 2.0.1 RC1 driver. checked Windows Local, Windows DStore and Linux DStore.
JUnit test for archive handling (around 39 testcases) has been run against RC1 driver for Windows Local, Windows DStore, and Linux DStore.