Bug 346530 - [ftp] FTP fails to re-connect automatically due to 30 sec timeout cache
Summary: [ftp] FTP fails to re-connect automatically due to 30 sec timeout cache
Status: NEW
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: dsdp.tm.rse-inbox CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: helpwanted
Depends on:
Blocks:
 
Reported: 2011-05-19 15:21 EDT by Samuel Wu CLA
Modified: 2011-05-20 10:41 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Samuel Wu CLA 2011-05-19 15:21:28 EDT
Build Identifier: RSE 3.2.2 maintenance

When the FTP subsystem connection drops, it may wait 30 seconds before trying to connect again.

Reproducible: Always

Steps to Reproduce:
In FTPService.getFTPClient(), _ftpClient.sendNoOp() is used to check that whether the connection is still alive. But it only does the check when (curTime - _ftpLastCheck > FTP_CONNECTION_CHECK_TIMEOUT)

We ran into a problem that the connection dropped after the first check. In this case, it won't try to connect until FTP_CONNECTION_CHECK_TIMEOUT which is 30 seconds later.

For instance, internalFetch() was called to resolve a filter. It called getFTPClient() and found the connection was still active. The connection dropped when it called chdir() or listFiles(). Since _ftpLastCheck was set to the current time when getFTPClient() was called, it would not try to connect within 30 seconds. 

I wonder whether _ftpLastCheck can be reset when IO exception is thrown so that it will try to reconnect when the user refreshes the filter.

It looks more like a problem on the ftp server side, but it would help if the client can be more robust.
Comment 1 Martin Oberhuber CLA 2011-05-20 03:25:02 EDT
I've updated the summary, previous value was:
   FTP connection failed to connect for 30 seconds

Resetting the timer after an exception seems like a good idea at first sight ... my concern would be that if the server went offline completely, we might end up in a loop trying to reconnect forever?

The 30 sec timeout was introduced in order to avoid constantly probing with NOOP. By far most of the time, FTP connections are stable ... I'd hate compromising the 99% case just for a 1% failure case, so if we change something here we should be very sure it's the right thing to do.

Perhaps the simplest possible change would be reducing the timeout to 5 sec. Probing by NOOP every 5 sec seems acceptable to me on a healthy connection, and may also be acceptable in the failure case.

I won't have time to investigate this any further, but I can review patches and provide advice.
Comment 2 Samuel Wu CLA 2011-05-20 10:41:28 EDT
From what I noticed, the RSE FTP client was in a kind of passive mode. It only tried to connect to the host when the user did some action such as save the file, refresh the directory etc. It should try to connect to the server any way in this case. The risk of falling in the loop is not high. 
To decrease the threshold to 5 seconds sounds fine.