Bug 221177 - Generic way of obtaining name of remote file.
Summary: Generic way of obtaining name of remote file.
Status: CLOSED FIXED
Alias: None
Product: ECF
Classification: RT
Component: ecf.filetransfer (show other bugs)
Version: unspecified   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: ecf.core-inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed, helpwanted
Depends on:
Blocks:
 
Reported: 2008-03-03 11:53 EST by Thomas Hallgren CLA
Modified: 2008-05-18 19:59 EDT (History)
2 users (show)

See Also:


Attachments
Patch introducing the getRemoteFileName() method (6.22 KB, patch)
2008-03-03 18:38 EST, Thomas Hallgren CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Thomas Hallgren CLA 2008-03-03 11:53:20 EST
When obtaining a file from a remote source it is often good to know the intended filename. It might be used in progress reporting and sometimes also as a source of information when naming the resulting local file.

The URL in itself is sometimes very cryptic with a path in some numeric form (calculated UUID perhaps). In some cases, the path doesn't contain a name at all (ends with download.php for instance) and the actual path is hidden in one of the parameters. When using HTTP, the returned "Content-Disposition" response header field is often a much better source for the filename then the actual URL.

Using ECF, I cannot obtain this header field and that's OK. There are a lot of transfer implementations where the use of response headers is not applicable. I would however like a IIncomingFileTransfer.getRemoteFileName() method. The HTTP transfer could use the Content-Disposition and other file transfer implementations could do a best effort based on whatever algorithm that would be appropriate for them. Below is the code that Buckminster uses to extract the file name from the Content-Disposition header.


   /**
    * This regular expression is a simple Content-Disposition header parser.
    * Content-Disposition grammar is quite complex, this is really simplified.
    * It should be refactored in future versions using proper grammar.
    */
   private final static Pattern s_contentDispositionPattern = Pattern.compile(
           ".*;\\s*filename\\s*=\\s*(\"(?:[^\"\\\\]|\\\\.)*\"|[^;\"\\s]+)\\s*(?:;|$)");

   private static String parseContentDisposition(String contentDisposition)
   {
       //Context-Dispositon syntax: attachment|inline[;filename="<filename>"]
       //Try to extract the filename form it (and strip quotes if they're there)
             if (contentDisposition == null)
           return null;
             String filename = null;
       Matcher m = s_contentDispositionPattern.matcher(contentDisposition);

       if (m.matches()) {
           filename = m.group(1);
           if (filename.startsWith("\"") && filename.endsWith("\"")) {
               filename = filename.substring(1, filename.length()-1).replaceAll("\\\\(.)", "$1");
           }
       }             return filename;
   }
Comment 1 Scott Lewis CLA 2008-03-03 15:05:47 EST
Thomas,

Currently, the org.eclipse.ecf.provider.filetransfer bundle (http implementation based upon URLConnection) is using the CDC 1.0/Foundation 1.0 execution environment, which does not have the Pattern class, Matcher class or associated String matching/substitution method (replaceAll).

I would rather not bump up the EE requirements for this bundle at this point, given the dependency in Equinox p2.

Would it be possible to get a Content-Disposition parser that does not use these classes/methods?  We do have a utility class:  org.eclipse.ecf.core.util.StringUtils which does a limited version of replaceAll:

http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.ecf/plugins/org.eclipse.ecf.core.identity/src/org/eclipse/ecf/core/util/?root=Technology_Project

If this is not doable in the short term I will attempt to put together a Content-Disposition parser myself after consulting the http spec, but won't be able to do it for a couple of days.
Comment 2 Thomas Hallgren CLA 2008-03-03 18:20:32 EST
Is the use of a StringTokenizer OK?
Comment 3 Scott Lewis CLA 2008-03-03 18:32:40 EST
(In reply to comment #2)
> Is the use of a StringTokenizer OK?
> 

Hi Thomas...yes...StringTokenizer is in CDC 1.0/Foundation 1.0.

Comment 4 Remy Suen CLA 2008-03-03 18:36:46 EST
We also have a StringUtils class somewhere I think for splitting Strings. I know Scott copied it into ECF somewhere.

Thomas, I'd advise that you get a CDC-1.0/Foundation-1.0 JRE installed on your workstation if you haven't already done so.
http://wiki.eclipse.org/J9
Comment 5 Thomas Hallgren CLA 2008-03-03 18:38:07 EST
Created attachment 91459 [details]
Patch introducing the getRemoteFileName() method

This patch adds the IIncomingFileTransfer.getRemoteFileName(). The default implementation in AbstractRetrieveFileTransfer will simply use the last segment of the remote URL path. The HTTPClientRetrieveFileTransfer however, will first consult the Content-Disposition (using a StringTokenizer) and then, if that fails, it will get the path from the HTTP respons. As a last resort it will default to the AbstractRetrieveFileTransfer.getRemoteFileName() implementation.
Comment 6 Thomas Hallgren CLA 2008-03-03 18:41:40 EST
This link http://www.jtricks.com/bits/content_disposition.html is a good source for both information and testing. At the end there's a couple of links for downloading small files. These links use the Content-Disposition header.
Comment 7 Scott Lewis CLA 2008-03-03 20:35:38 EST
Applied patch with only minor changes (e.g. created HttpHelper class in org.eclipse.ecf.provider.filetransfer and added support for using content disposition header in URLConnection-based filetransfer as well).

Created new test case GetRemoteFileTransferTest and tested with both urlconnection and httpclient providers.  

Added contribution to ip log.  Regenerated javadocs and committed to website.

Thanks Thomas for your contribution.

Comment 8 Scott Lewis CLA 2008-05-18 19:59:13 EDT
closing