Bug 195830 - [performance] RSE performs unnecessary remote list commands
Summary: [performance] RSE performs unnecessary remote list commands
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Windows XP
: P2 normal (vote)
Target Milestone: 2.0.1   Edit
Assignee: Javier Montalvo Orús CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: performance
Depends on: 162950
Blocks: 198638
  Show dependency tree
 
Reported: 2007-07-09 10:17 EDT by Kevin Doyle CLA
Modified: 2009-03-18 07:49 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Doyle CLA 2007-07-09 10:17:43 EDT
This bug is created from the discussion of bug 194204. 

When doing a refresh, opening a file, etc unnecessary remote list commands are performed.  

Console Log from opening a file from /public_html/B2--aa/test55555:

NOOP
200 Zzz...

CWD /public_html/B2--aa/test55555
250 OK. Current directory is /public_html/B2--aa/test55555

PORT 9,26,148,193,6,177
200 PORT command successful

LIST
150 Connecting to port 8226

226-ASCII

226-Options: -a -l 

226 4 matches total

NOOP
200 Zzz...

CWD /public_html/B2--aa/test55555
250 OK. Current directory is /public_html/B2--aa/test55555

PORT 9,26,148,193,6,178
200 PORT command successful

RETR test25.php
150 Connecting to port 8246

226-File successfully transferred

226 0.007 seconds (measured here), 1.47 Kbytes per second

NOOP
200 Zzz...

CWD /public_html/B2--aa/test55555
250 OK. Current directory is /public_html/B2--aa/test55555

PORT 9,26,148,193,6,179
200 PORT command successful

LIST
150 Connecting to port 8275

226-ASCII

226-Options: -a -l 

226 4 matches total

NOOP
200 Zzz...

CWD /public_html/B2--aa
250 OK. Current directory is /public_html/B2--aa

PORT 9,26,148,193,6,180
200 PORT command successful

LIST
150 Connecting to port 8309

226-ASCII

226-Options: -a -l 

226 4 matches total

NOOP
200 Zzz...

CWD /public_html/B2--aa/test55555
250 OK. Current directory is /public_html/B2--aa/test55555

PORT 9,26,148,193,6,181
200 PORT command successful

LIST
150 Connecting to port 8335

226-ASCII

226-Options: -a -l 

226 4 matches total

-----------Enter bugs above this line-----------
TM 2.0 Testing
installation : eclipse-SDK-3.3
RSE install  : RSE 2.0
java.runtime : Sun 1.5.0_11-b03
os.name:     : Windows XP, Service Pack 2
------------------------------------------------
Comment 1 Martin Oberhuber CLA 2007-07-10 08:54:42 EDT
Improving performance is very important especially with respect to EFS. 

It looks like FTP is a good test candidate since all remote operations are visible in the log. I'd like to see this addressed with high priority though I'm not assigning a targetmilestone yet since it's unclear whether the necessary changes can be fully API Contract compatible.
Comment 2 David McKnight CLA 2007-07-11 17:26:46 EDT
My first observations on this are as follows:

The listFiles() method of FTPService is used frequently and not just for doing a query.  It's used whenever getFile() is called - within the FTPService, getFile() is called by createFile(), createFolder(), delete(), download and setReadOnly().  

During the open scenario the following happens:
1) in FTPService.download(), we call FTPService.getFile() which calls list.
2) In SystemEditableRemoteFile.doDownload(), we call getRemoteFileObject(), which calls FTPService.getFile() which calls list.
3) SystemEditableRemoteFile.openEditor() which call getRemoteFileObject() which calls FTPService.getFile().

The operation should only need to query the remote file once before the download to make sure we're dealing with a file that hasn't changed on the host.  IRemoteFiles have been initialized to be marked stale so that we don't used cached info when querying them.  However, this should only apply to folders and archives since files are leaf nodes.  I've changed the code to mark a IRemoteFile as non-stale if it is a file but isn't an archive.  This prevents the 2) query here so for FTP it reduces the number of list calls to two.


Comment 3 Martin Oberhuber CLA 2007-07-12 11:51:46 EDT
I thought that the Service could use a "Stat cache" in order to avoid unnecessary lists. This is a common technique which is also used in the Eclipse Resource system, and we also had it in our earlier products.

Basically, the list() command would store its result in a local cache along with a timestamp. When list() is requested again with the same properties within a very short time frame (typically 1 second or less), use the cached information instead. Any operation that modifies the remote needs to invalidate the stat cache.

I think that the stat cache could probably (at least partially) be implemented in an abstract superclass so all the services could use it, but that would be an API change. If we want to use a stat cache for performance improvements in 2.0.1, each service needs to do it itself (it should be simple).

Keep in mind, though, that a stat cache is kind of a workaround for avoiding unnecessary queries that occur because the higher levels don't re-use information properly. So analyzing the higher levels as to why the extra queries are there is always a good idea.
Comment 4 David McKnight CLA 2007-07-12 12:02:38 EDT
I believe the extra list (via getFile()) from the service's download() code is unique to the FTPService.  Other services don't have this extra query so I'm thinking this particular issue doesn't need to be handled generically. 

Comment 5 Martin Oberhuber CLA 2007-07-13 06:45:18 EDT
Javier can you investigate why there is an extra getFile() call?
Comment 6 Javier Montalvo Orús CLA 2007-07-17 11:23:57 EDT
The getFile() in FTP service dowload() is used to get the file size, so the progress bar and percentage can be shown during the download.
This have to be done as FTP doens't provide the size of the file before the download.
Comment 7 Martin Oberhuber CLA 2007-07-17 11:34:30 EDT
Hm, I see. 

Perhaps we should change the IFileSerivce.download() API in the future to call with an IHostFile instead of a plain String for the remote file. Then, the IHostFile should already hold the remote file size.

Thoughts?
Comment 8 Martin Oberhuber CLA 2007-07-17 11:37:00 EDT
As a termporary workaround, FTP could do a statCache to remember the Mapping of remote-path (String) to IHostFile, at least for a short time. Since RSE must have got the IHostFile before, then the FTPService knows the IHostFile from its internal map and can deduce the file size from there.

But anyways, I think that download is slower than getting the file size so the extra list command is perhaps not much of an issue.
Comment 9 Javier Montalvo Orús CLA 2007-08-01 13:19:01 EDT
Removed unnecessary LIST commands by adding a map between absolute file names and their IHostFile object.
This should be a temporary solution to avoid accessing the remote server when not necessary.
The ideal solution would be passing the IHostFile to the FTP service.
Comment 10 Testo Nakada CLA 2009-03-18 07:31:45 EDT
I also found that during the session of FTP, it performs many NOOP operation. Can we get rid of it? It causes the FTP connection with high latency connection to be very slow especially for file browsing.
Comment 11 Martin Oberhuber CLA 2009-03-18 07:49:50 EDT
(In reply to comment #10)
> I also found that during the session of FTP, it performs many NOOP operation.

I have filed bug 269171 to track this.