Bug 256581 - [ssh][performance][api] Improve Sftp performance by re-using open file channels where possible
Summary: [ssh][performance][api] Improve Sftp performance by re-using open file channe...
Status: RESOLVED FIXED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 3.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 3.3 M5   Edit
Assignee: Martin Oberhuber CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: api, helpwanted, performance
Depends on:
Blocks: 209090 248913 230831
  Show dependency tree
 
Reported: 2008-11-26 04:38 EST by Martin Oberhuber CLA
Modified: 2011-02-01 03:24 EST (History)
3 users (show)

See Also:


Attachments
Patch for SftpFileService.java (11.46 KB, text/plain)
2011-01-26 12:47 EST, Martin Tauber CLA
mober.at+eclipse: iplog+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2008-11-26 04:38:15 EST
As per bug 230831 comment 24, we found through the
   RSEFileStoreTest.test255files()
unittest that up to a 17 times performance improvement can be made in the SftpFileService, if existing open file channels are re-used.

Right now, in SftpFileService we have a single fChannelSftp which we use for directory queries. For uploads and downloads, a new (non-public) sftp channel is opened each time.

We will need to change this into a generic pool of re-usable sftp channels, from which clients can take an available channel when needed and return it to the pool when done. The pool should be configurable to support a maximum number of channels per session. In case the maximum number of channels is exceeded, or JSch throws a "no more channels" exception when opening a channel, the client will either need to wait until a channel is returned, or the pool opens an additional session in order to get access to more channels. 

Opening an additional session at this point would also fix bug 209090 (the session should remain in the pool); this session aspect of the pool will need to live in the ConnectorService, since it is equally applicable to shell channels and sftp channels and only the ConnectorService has lifecycle control over the Session. If this is done, we will need new API in SshConnectorService / ISessionProvider, since clients will need to ask the connectorService for new channels instead of the JSch Session.

When the ConnectorService closes the connection, all open channels and sessions need to be closed as well. In addition to that, the ChannelPool may use a Timer to periodically check for unused channels and close them as they are no longer needed.
Comment 1 Martin Oberhuber CLA 2010-02-26 19:11:29 EST
Bulk update of target milestones to 3.2
Comment 2 Martin Tauber CLA 2011-01-26 12:47:46 EST
Created attachment 187662 [details]
Patch for SftpFileService.java
Comment 3 Martin Tauber CLA 2011-01-26 12:56:57 EST
Martin,

attached you'll find a bugfix for this problem. As discussed, this is not the full blown implementation of the fix, but it should cover most of the cases. For the rest it should work as before.

As far as I remember there are still some hardcoded values in the fix (size of the cache) that would for my taste still need some cleanup. Also I started codeing the cache for exec channels which actually doesn't really make sence which I realized after the discussion we had. This would also need some cleanup. (Which I'm happy to do, but I don't think I'm going to make it until monday.)

Nevertheless the patch as implemented works fine in my environemnt.

Regards
Martin
Comment 4 Martin Oberhuber CLA 2011-02-01 01:08:19 EST
This contribution looks good - thanks!

Unittests still pass. The "RSEFileStoreTest.test255files()" speeds up from 488 sec to 289 sec with the patch. This is not quite the improvement I had hoped for, but it is an improvement.

From code review, it looks like in line 1027, instead of this:
	if (channel.getChannel() != null) {
		channel.getChannel().disconnect();
we'll need this:
	if (channel != null) {
		channel.release();

The contribution addresses the original performance problem, and it should be possible to improve fairly easily for also addressing bug 209090 (by making the number of pooled channels configurable, and changing the policy for what to do when running out of pooled channels - open a new Session in that case).

The code for "exec" channels isn't currently used; the channel pool should support "shell" channels instead. Also, the channel pool should be moved into the SshSessionProvider such that both Sftp and SSH can leverage it.

I'd like to commit the patch. The one thing I need before I can do so is confirmation from the submitter that you wrote the patch yourself, and that you have permission from your employer to contribute the patch under the EPL (see
http://eclipse.org/tm/development/committer_howto.php#external_contrib ).
Comment 5 Martin Tauber CLA 2011-02-01 01:50:41 EST
Good Morning Martin,

Yes the code changes I made can be used under EPL. Since I'm self employed this is not a problem. Concerning the performance improvement: I saw that the Refresh does not speed up with this bugfix. This now is the major part when I start my application. When I then start reading files (my application compiles tons of files when it start up) the performance improvement is about 5-7 times faster.

Regards
Martin
Comment 6 Martin Oberhuber CLA 2011-02-01 03:09:38 EST
Comment on attachment 187662 [details]
Patch for SftpFileService.java

Patch committed with minor modifications:
  - fix Java warnings, add Copyright comment,
  - fix Copyright year,
  - fix the issue in line 1027

I'll keep the bug open for just a little bit more investigations.
Comment 7 Martin Oberhuber CLA 2011-02-01 03:24:40 EST
Here in my office, where the SSH server is on the same LAN, the performance improvement is much more drastic: 7.6 seconds on test255files() instead of 76 seconds without the patch -- a 10x improvement!

I'm marking the bug FIXED, will follow up in other bugs.

In order to improve the "Refresh" Performance, I assume you'll have to implement a cache for the SftpFileService#getFile() -- very similar to what we already do in the FTPService#getFileInternal() with _fCachePreviousFiles.

The drawback of such a cache is that there's a slight chance to miss updates to the remote file when they come from the outside (modifications from inside RSE are handled through clearCache()). But the benefit of such an fstat cache can be drastic since RSE typically performs multiple getFile() calls very quickly in succession; and, file modifications "from the outside" may show some latency anyways.

Martin, if you want to work in this direction, could you create a new bug for this work? 

Thanks again for the contribution!