Bug 198046 - [dstore] Cannot copy a folder into an archive file
Summary: [dstore] Cannot copy a folder into an archive file
Status: CLOSED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: 2.0.1   Edit
Assignee: Xuan Chen CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-07-26 19:16 EDT by Xuan Chen CLA
Modified: 2007-09-13 17:11 EDT (History)
0 users

See Also:
xuanchen: review+


Attachments
fix for bug 198046 (4.33 KB, patch)
2007-07-29 22:51 EDT, Xuan Chen CLA
no flags Details | Diff
Updated fix according to Martin's comment. (4.13 KB, patch)
2007-08-04 11:42 EDT, Xuan Chen CLA
no flags Details | Diff
Patch to improve copyBatch() code (4.02 KB, patch)
2007-08-06 09:06 EDT, Martin Oberhuber CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Xuan Chen CLA 2007-07-26 19:16:02 EDT
I found out if I copy a folder into an archive file, expand it, it is empty.
I have the same behaviour in both lastest dev driver and RSE-SDK-I20070726-0600.zip driver.
It happens for both DStore Windows and DStore Linux.
No problem for Local Windows.

But a very strange thing is that my JUnit testcase for running the same scenario worked.

************************************************
TM 2.0.1 Testing
installation : eclipse-platform-3.3RC3 
RSE install  : RSE-SDK-I20070726-0600.zip
java.runtime : Sun 1.5.0_06-b05
os.name:     : Windows XP 5.1, Service Pack 2
------------------------------------------------
Comment 1 Martin Oberhuber CLA 2007-07-27 04:18:48 EDT
Can you check

  1. If the folder actually is in the archive (and just not shown) or it has
     never been copied into the archive (I would assume it is just not shown
     since your unit test is ok)

  2. If the behavior is a regression or already failed with 2.0.0.1 (I would 
     assume it is a regression since it seems an obvious thing)

  3. If it was OK in 2.0.0.1, also check whether it worked in RSE I20070719-1300
     If yes, then this is likely a side-effect similar to bug 197784 which would
     have been caused by the fix for bug 197089 or bug 197484.
Comment 2 Martin Oberhuber CLA 2007-07-27 04:20:29 EDT
It would be ideal if you could come up with a unit test that shows the failure, and runs OK after the issue has been fixed.

Such a unit test could be a key door opener for lots of other unit tests related to the SystemView and filtering -- areas where we've often had issues and especially regressions. Starting to get unit test coverage for those kind of things would be a big, major step forward.
Comment 3 Xuan Chen CLA 2007-07-27 15:36:20 EDT
I did some investigation, and the following is what I found:

.  It is not a regression.  I think this problem was introduced when UniversalFileSystemMiner#handleCopyBatch() was introduced (which was already existed on 1.1 version).  
. The reason that my JUnit testcase for similar scenario worked is that I used API IFileServiceSubSystem#copy(), instead of IFileServiceSubSystem#copyBatch().  But the copy operation in SystemView invokes copyBatch() operation even only copy one folder/file.  It seems to me copy()is only used if you trying to overwrite a file/folder.
So I will add testcases using copyBatch() as well.

The problem for UniversalFileSystemMiner#handleCopyBatch() is the following:

If the target folder is an archive file or virtual folder, it will call ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names) to add the list of source files/folders passed from the client.  
Unlike ISystemArchiveHandler#add(File file, String virtualPath, String name), which gets all the list of the input file's decendent if it is a directory,
ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names) assumes this "files" parameters  contains all the files/folders we need to add to the archive file.  So in my scenario, it was only the folder I want to copy, but not all its decendents.


The reason it worked for copy() (in the JUnit testcase) was that UniversalFileSystemMiner#handleCopy() was called, and it in turn called ISystemArchiveHandler#add(File file, String virtualPath, String name).  And this add() method make sure to include all the decendents of the "file" if "file" is a directory.

This scenario works for Local since LocalFileService#copyBatch() simply loops through all the input source, and call LocalFileService#copy().

From what I've seen, the simpliest fix for this problem is to use the same approach as LocalFileService -- let the UniversalFileSystemMiner#handleCopyBatch() loop through the input sources, and call UniversalFileSystemMiner#handleCopy().
But adding a file/folder into an archive is very expensive.  It needs to extract the contents of an archive, and then reconstruct it.  We want to reduce those operations to minimum.

So the proposed way to fix it is the following:

For each source files passed by client, checking if it is a directory.
If yes, use the similar code from Archive handler to get a complete list of its decendents (we need to copy the code SystemZipHandler#listAllFiles() to a private method in UniversalFileSystemMiner since this method is not part of the ISystemArchiveHandler interface).  We gather the all those decendent file/folders plus the source files into an array, and then call ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names).
Comment 4 Xuan Chen CLA 2007-07-27 15:45:43 EDT
Just one more thing I want to add here to make sure I did not forget to also look into it:

If I already has a folder with the same name in the archive file (the folder is empty), and then I copy this folder, and paste it into the archive file.
I got the Duplication Name Collision dialog.  Overwrite is chosen, and then I click OK to overwrite it.  Then I expanded newly copied folder.  It is still empty.  

This happens in Windows Local as well.

It seems something wrong for the algorithm for processing of replacing file/folder in an archive file.
Comment 5 Xuan Chen CLA 2007-07-27 23:49:53 EDT
I just tried the move, and it actually worked.

The reason is that move actually used "copy" and then "delete", and "copy" works.  There is no "moveBatch".
Comment 6 Xuan Chen CLA 2007-07-29 22:51:01 EDT
Created attachment 74891 [details]
fix for bug 198046

This fix makes the following changes in the copyBatch() method of UniversalFileSystemMiner:

For each source files passed by client, checking if it is a directory.
If yes, use the similar code from Archive handler to get a complete list of its
decendents. We gather the all those decendent
file/folders plus the source files into an array, and then call
ISystemArchiveHandler#add(File[] files, String virtualPath, String[] names).

This fix is made in the dstore server side.
Comment 7 Martin Oberhuber CLA 2007-07-30 05:40:22 EDT
Couple of questions / comments:

1. Is a similar fix also needed for Local?
2. listAllFiles() method:
   2.1. I first thought that by having a HashSet parameter, you expose some
        internal details of the implementation to the caller. That's not
        beautiful. I later noticed that you use the HashSet in order to
        do the symlink checks. The Javadoc should document the parameters
        better, even if it's a private method. Especially when Collections 
        are part of an API for a method, you MUST document in the Javadoc
        what type the contents of the collection are assumed e.g.
          @param found HashSet<File> being populated and returned
   2.2. Your check against symbolic links can not work like this. You need to
        do java.io.File.getCanonicalPath() and stuff that one into a HashSet,
        without it you'd never detect the symbolic link target.
        And you need to do that check only for directories, but not for files --
        getCanonicalPath() is expensive so it should be avoided if possible.
   2.3. Because of these two issues I think you'd be better of doing a nested
        static class in order to collect the files -- such a class could
        even be overridden to only get particular patterns of files if
        one desired at some point:

        Collection allFileNames = new FileCollector().getFiles(parentDir);

        private static class FileCollector {
            private Set fCanonicalDirsVisited = new HashSet();
            private List fResults = new ArrayList();
            public Collection getFiles(String parentDir) {
                File parentFile = new File(parentDir);
                if (parentFile.canRead()) {
                   if (parentFile.isDirectory()) {
                      internalGetFiles(parentFile);
                   } else {
                      results.add(parentDir);
                   }
                }
                return fResults;
            }
            protected void internalGetFiles(File parentDir) {
                String cp = parentDir.getCanonicalPath();
                if (!fCanonicalDirsVisited.contains(cp)) {
                    ...
                }
            }
        }

   2.4. I was also wondering what happened to empty directories in your
        "Add all to zip" recursion code. Perhaps the recursive adding of
        directories should be implemented in the Archive Handler rather
        than the client? - Archive types might differ whether they can
        add empty directories or not.

3. At the place where you construct the list of names to be added to the
   archive, you convert back and forth between Files and Strings ... is 
   that really necessary? - Try to think about what you need and don't 
   convert so much.

4. When you do thisName.replace ('\\','/') be aware that on UNIX, the 
   backslash is a valid character in a file name. There might be other
   parts of our code which can not handle that special case, but wherever
   we touch and change the code anyways, we should try and be aware of it.
   Not honoring such special cases could be where viruses, trojans and other
   malicious software creeps into a remote system!
Comment 8 Martin Oberhuber CLA 2007-07-30 10:15:24 EDT
Reading comment #3 again, I think the right solution is that copyBatch() iterates over the list of items to be added, and calls 

   ISystemArchiveHandler#add(File file, String virtualPath, String name)

for each of the items in turn to have them added recursively. Rationale is that depending on the kind or archive handler, it may be able to add symbolic links as symbolic links rather than following them, properly add special devices, empty directories and other peculiarities of the file system. We should really leave all that stuff to the archive handler rather than doing our own job trying to recurse the sources.

Or would that be an issue when the sources and the archive to add the stuff into are on different hosts? - Perhaps that case would need special consideration, having to do the recursion inside the copyBatch method after all?

Symbolic links and hard links are another thing to consider: should they be followed unless forming a cycle, or should they never be followed? See Bug 44107 comment 10 for a list of possible options.
Comment 9 Xuan Chen CLA 2007-07-30 11:31:16 EDT
This is regarding to Comment #8.

I agreed that the more proper way to resolve this problem is that copyBatch()
iterates over the list of items to be added, and calls 

   ISystemArchiveHandler#add(File file, String virtualPath, String name)

for each of the items in turn to have them added recursively. 

This approach is similar to copyBatch calling copy() to handle each of the file/folder in the list passed by client (of course, it will elminate some checkings).  LocalFileService does not have the issue because it uses the similar approach (copyBatch call to copy() for each files in the list).

The reason I did not use this approach before is for performance consideration, since the implemation of add() method (e.g., SystemZipHanlder) which takes a single file object will reconstruct the entire archive file, and it is very expensive when dealing with large archive.

I think the root of the problem is that the add(File[] ...) methods could not distinguishs between a list of files which already been expanded by add(File ...) method or just the list passed by client.  When we could make the API change, we should add an add(File[] ...) API which take a list of File object, expand all the File object in the list, and then call the right method to really construct the archive file just once.  And copy() could be changed to just calling this new API by passing a list containing only one File object, instead of call add(File ...) (the one expect single File object).

For now, I will change the code according to comment #8, but will open a bug to keep track the API change. 

Please comment on this proposal.  Thanks.
Comment 10 Martin Oberhuber CLA 2007-07-30 11:38:45 EDT
Proposal sounds good. I'd hope in reality the list of items in copyBatch should not be too large so the performance issue should not be that bad. 

In fact you could do a hybrid approach, where you sort the items passed into copyBatch such that the "files" are processed by calling add(File[]) but the folders are iterated over by calling add(File).

Can you ensure that the expansion code in add() for archives does not die on cyclic symbolic links, as your attached patch would have died?

Also, please check that the right thing is done when adding a folder from host A into an archive on host B.
Comment 11 Xuan Chen CLA 2007-07-30 12:00:22 EDT
The scenario I could think of for large number of folders is that user could use this copyBatch function to actually construct an archive file.

For example, create a myzip.zip, and then select all the folders interested, and copy/paste into myzip.zip.

The add() inside the zip handler will have the same problem as my added code, since I basically copy this method from zip handler to UniversalFileSystemMiner.  The reason is that it is not part of interface for ISystemArchiveHandler, and I could not call it from UniversalFileSystemMiner without casting the archive handler.

I will take a look at this method inside each archive handler implemention, and make the necessary changes according to your comment.

Comment 12 Martin Oberhuber CLA 2007-07-30 12:04:47 EDT
Thanks.
I'd suggest filing a separate bug for that, e.g. "Adding a folder to an archive hangs on cyclic symbolic link" so it's documented (also for 2.0.1 release notes). If you can, also add a unit test for that case.
Comment 13 Xuan Chen CLA 2007-08-04 11:42:44 EDT
Created attachment 75376 [details]
Updated fix according to Martin's comment.

I updated the patch according to Martin's comment -- call ISystmArchiveHandler#add(File ...) for directory one by one, and call ISystmArchiveHandler#add(File[]...) for all the files in batch.

I need to use an ArrayList to contain all the files since I don't know the # of files ahead of time, and then I need to convert it back to File[] since it is what the ISystmArchiveHandler#add(File[]...) expects.

This fix in on DStore server side.

JUnit testing for copyBatch() has been run on this fix in both Windows and Linux DStore servers.
Comment 14 Xuan Chen CLA 2007-08-04 11:47:12 EDT
I've committed the fix to CVS since I need to work on the updated file on a different bug.
Comment 15 Martin Oberhuber CLA 2007-08-06 09:06:05 EDT
Created attachment 75426 [details]
Patch to improve copyBatch() code

Attached patch improves the code as follows:

1.) File[] srcFiles, String[] srcNames are arrays are unnecessary, since only
    the current srcFile / srcName is ever looked at. By getting rid of the 
    arrays, the JDT Compiler also shows another flaw in the code -- when 
    srcType is invalid, it was unclear what the code would have done. Added
    an else... branch to bail out in that case.

2.) if(result == false) should be written more briefly:
    if(!result)

3.) Converting an ArrayList into an Array should be done as follows, which
    is well-known and often used boilerplate code -- thus faster to read
    and faster in execution than iterating the loop ourselves:

    Foo[] fooArray = (Foo[])fooList.toArray(new Foo[fooList.size()]);

Please review -- I'm committing the patch.
Comment 16 Xuan Chen CLA 2007-08-06 12:03:43 EDT
I chosed the "+" sign beside the review request.  
Thanks Martin, for showing those code improvements.
Comment 17 Xuan Chen CLA 2007-08-09 17:16:42 EDT
There is a typo in the code which was not caught by the code review.  It was caught by re-running the JUnit testcases:

Instead of:
String[] resultNames = (String[])nonDirectoryArrayList.toArray(new String[nonDirectoryNamesArrayList.size()]);

Should be:
String[] resultNames = (String[])nonDirectoryNamesArrayList.toArray(new String[nonDirectoryNamesArrayList.size()]);

I will update it in the UniversalFileSystemMiner#handleCopyBatch().
Comment 18 Xuan Chen CLA 2007-09-13 14:23:06 EDT
Verified that this problem is fixed in 2.0.1 RC1 driver.

checked Windows Local, Windows DStore and Linux DStore.
Comment 19 Xuan Chen CLA 2007-09-13 17:11:48 EDT
JUnit test for archive handling (around 39 testcases) has been run against RC1 driver for Windows Local, Windows DStore, and Linux DStore.