Community
Participate
Working Groups
Created attachment 99016 [details] Exception traces Hi, I DLTK we need to scan all files in project during build. But for big projects in in some configurations like, connecting to Solaris, we have see connection lost dialog message "RSEG 1058", and appropriate RSE exception. I've discovered that this errors are caused by NPE in jsch. This happen for both 3.0M5 and 3.0M6 versions. Error logs are in attachment. I've tested with Eclipses 3.2.1 and 3.4(m5,m6,m7). On servers we use: Linux - OpenSSH 4.7_p1-r6 Any ideas? Thanks.
Created attachment 99023 [details] JSch-0.1.38 preview Hm... apparently this happens from the EFS provider, which uses the JSch Streams support. The Platform switched to a newer JSch version (0.1.37) in Eclipse 3.4M6, with bug 221378. So we're only going to focus on Eclipse Platform M6 or later, with RSE 3.0M6 or later. Since then, JSch-0.1.38 was released on Apr/22 with the following announcement: http://sourceforge.net/mailarchive/forum.php?thread_name=200804221521.AAA16510%40jcraft.com&forum_name=jsch-users "- bugfix: NPE should not be thrown at unexpeced session drop. FIXED." Attached is an unofficial "eclipsified" version of JSch-0.1.38. Just drop it into your plugins (or dropins) folder and start eclipse -clean, it should get picked up (you can rename JSch-0.1.37 to 0.1.37.orig if you want to be super safe). Please try if this improves the situation for you.
Created attachment 99030 [details] Exception with 0_1_38 No, using of 0.1.38 not helps at all. But exception now are different. I've attached log.
It would be interesting to know what JVM you are using, since java.io.PipedInputStream is part of your backtrace and that class is known to be problematic at times. An you attach a complete configuration log (Eclipse Help > About > Configuration Details > Copy&Paste into a file). A complete Thread Dump from your running Eclipse (while file transfers are ongoing ) could also be interesting, to see how many SSH transfers are going on in parallel.
Created attachment 99039 [details] Configuration details I've using Java(TM) SE Runtime Environment (build 1.6.0_02-b05). As for connections, I've have only one "Connection thread".
Just to be safe, could you try JDK 6 Update 6 and / or Java 5.0 update 15 http://java.sun.com/javase/downloads/index_jdk5.jsp
Hi, I've checked with jdk1.6.0_06 problem still exist. Also in last tests I've could see about 10-15 threads with names like "Connect Thread SOLARIS.LOCAL session". I suppose this is JSch sessions, in which disconnect are happen? Also, I've enabled RSE SSH tracing and attached log then disconnect are happen. Part of log: >>>> 2008-05-08 15:15:38.137 | Worker-4 | SftpFileService.getInputStream tomAmp.tcl 2008-05-08 15:15:38.220 | Worker-4 | SftpFileService.getInputStream tomAmp.tcl ok 2008-05-08 15:15:38.292 | Worker-4 | SftpFileService.getInputStream s_user_type.test 2008-05-08 15:15:38.383 | Worker-4 | SftpFileService.getInputStream s_user_type.test ok 2008-05-08 15:15:38.389 | Worker-4 | SftpFileService.getInputStream atsSR.tcl 2008-05-08 15:15:38.389 | ModalContext | SshConnectorService.disconnect 2008-05-08 15:15:38.391 | ModalContext | SshConnectorService.sessionDisconnect 2008-05-08 15:15:38.394 | ModalContext | SftpFileService.uninitService 2008-05-08 15:15:38.395 | ModalContext | SftpFileService.disconnect 2008-05-08 15:15:38.441 | Worker-4 | SftpFileService.getInputStream /export/home/haiodo/tcl/ats5.0.0/ats_lib/autoeasy/lib/atsSR.tcl failed: com.jcraft.jsch.JSchException: session is down 2008-05-08 15:15:38.460 | Worker-4 | SshConnectorService.connecting... 2008-05-08 15:15:39.023 | Worker-4 | SshConnectorService.connected 2008-05-08 15:15:39.024 | Worker-4 | SftpFileService.initService >>>> I'm wondering what ModalContext is, and why it calls disconnect method of SshConnectorService? Best regards, Andrei.
Created attachment 99236 [details] RSE SSH Log
One more point: My current implementation will call IFile.getContents(true) for a ten times if first time it was failed. Is it possibly on RSE side to wait until connection will be reestablished?
Adding myself on the CC list.
I've found what in most cases disconnects are happen then we try to create more than ten ssh channels. I've updated my code, and now disconnects are almost not happen. Is it possibly to handle such situations in RSE? For example if RSE create more what one SSH connection, or wait, if we have more than ten requests?
Created attachment 100117 [details] Solaris disconnect log I've enabled tracing into JSch implementation and found what disconnects are happen because of following error: SSH_MSG_DISCONNECT: 2 Could not create socket pairs: Too many open files This happen on each 244 channel creation. May be this is ssh server implementation limitation? Or it is bug in JSch, and channels are not properly closed?
JSch definitely does close its channel(s), when the client application closes it. 244 channels is a whole lot, and it looks like you might be hitting a server limitation here. Most (openSSH) servers support 10 concurrent channels, some smaller servers support only one -- which is very problematic for RSE, see bug 209090. In your case, I'd recommend checking why there are so many open channels. Looks like you're using the EFS integration and your client(s) don't close their Streams so this might be a leak in your client code.
BTW, I have requested an upgrade to JSch-0.1.38 with bug 232416.
Hi Martin, 244 is not a count of channels created. In one time I've have only one channel created. Following scenario: 1) Get file A -> open channel -> read -> close channel 2) Get file B -> open channel -> read -> close channel. etc. But on each 244 request I've have this problem.
Atsuhiko, do you have an idea what could be going wrong in JSch?
(In reply to comment #1) > Since then, JSch-0.1.38 was released on Apr/22 with the following announcement: > http://sourceforge.net/mailarchive/forum.php?thread_name=200804221521.AAA16510%40jcraft.com&forum_name=jsch-users > "- bugfix: NPE should not be thrown at unexpeced session drop. FIXED." I think this fix does not related the reported problem. That fix is for NPE at unexpected socket drop during file transfer. The reported problem has occured in establishing sftp channel as you know.
(In reply to comment #14) > 244 is not a count of channels created. > In one time I've have only one channel created. > Following scenario: > 1) Get file A -> open channel -> read -> close channel > 2) Get file B -> open channel -> read -> close channel. > etc. > But on each 244 request I've have this problem. Have you explicitly closed the input-stream from RSE? I mean the stream from SftpFileService#getInputStream. |com.jcraft.jsch.JSchException: java.lang.NullPointerException | at com.jcraft.jsch.ChannelSftp.start(ChannelSftp.java:234) | at com.jcraft.jsch.Channel.connect(Channel.java:199) | at com.jcraft.jsch.Channel.connect(Channel.java:144) | at org.eclipse.rse.internal.services.ssh.files.SftpFileService.getInputStream(SftpFileService.java:1123)
Yes, I've close streams before opening next one.
Where can I get the source code of org.eclipse.dltk.*?
You could obtain code from dev.eclipse.org cvs root: dev.eclipse.org/technology/ projects: org.eclipse.dltk/core/plugins/org.eclipse.dltk.* rse intergation are in: org.eclipse.dltk/rse/plugins/org.eclipse.dltk.rse.*
(In reply to comment #20) > You could obtain code from dev.eclipse.org cvs > root: dev.eclipse.org/technology/ > projects: org.eclipse.dltk/core/plugins/org.eclipse.dltk.* > rse intergation are in: > org.eclipse.dltk/rse/plugins/org.eclipse.dltk.rse.* Thank you for informations. I want to reproduce that problem on my environment. How to do it? Which version I should download from http://download.eclipse.org/technology/dltk/downloads/ ? Then please introduce the steps to re-produce that problem. It seems resources have been leaked on the sshd. I guess that files or channles are not closed on the remote due to some reasons.
I could reproduce the problem without DLTK and RSE and have found the reason at last. The following code will make that error in sshd with "Could not create socket pairs: Too many open files", session.connect(); Channel c=null; int i=0; try{ for(i=0; i<1024; i++){ c=session.openChannel("sftp"); c.connect(); c.disconnect(); } } catch(Exception e){ System.out.println(i); System.out.println(e); } The message "Could not ..." has come from OpenSSH's session.c[1], int inout[2], err[2]; /* Uses socket pairs to communicate with the program. */ if (socketpair(AF_UNIX, SOCK_STREAM, 0, inout) < 0 || socketpair(AF_UNIX, SOCK_STREAM, 0, err) < 0) packet_disconnect("Could not create socket pairs: %.100s", strerror(errno)); I reviewed that session.c and have found a bug, which has missed to free 'err' for subsystem channels. It seems that this bug has been fixed just a week ago[2] Anyway, jsch can not work around this bug unfortunately, and all we can do is to re-use sftp channel as possible as. [1] http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/session.c [2] http://www.openbsd.org/cgi-bin/cvsweb/src/usr.bin/ssh/session.c.diff?r1=1.236&r2=1.237
Whow! Thanks Atsuhiko for this in-depth analysis. We could try and pool the Sftp Channels for re-use, but the drawback here is that we need to keep the open Channel(s) around so they are not available for others even if we idle. I'll investigate this a bit more to see what (Performance) impact it has. Perhaps just keeping one or two Channels available in a Pool would be sufficient since Eclipse hardly ever opens multiple Streams in parallel. I'm inclined to doing this if it also boosts our Performance, although the original problem here is in OpenSSHd so it's NOT_ECLIPSE. Andrei could you help out a bit here, e.g. writing a Unittest that I could use to measure Performance before and after any change? Tentatively targeting RC3 for a fix but could get in earlier.
I've written a Unittests that creates 255 files through SSH EFS connection (IFileStore.openOutputStream()) and have not hit the exception. It took 77,556 seconds on my connection. I then hacked SftpFileService to always re-use a single shared channel; that's possible in my Unittest because I KNOW that I always access the same channel sequentially. With the hacked single channel, the test was done in 4,516 seconds on my connection. This is a cut down to 6% of the original time, or a 17 times performance improvement. Sufficient to go and shoot for an implementation which re-uses channels, although it'll be considerably harder to do when multiple channels can be used simultaneously.
Given that what makes the issue "major" is not an Eclipse problem, but a bug in OpenSSH, only the performance issue remains. We'll need to defer addressing that to RSE 3.0.1 -- is it OK for you to decrease severity to "Normal" since there is a workaround available for the Major issue (i.e. upgrade your buggy OpenSSH version)? I'm keeping Priority P2 such that we don't forget adressing this early in 3.0.1 since lots of people will benefit from the performance improvement, and the fix is likely risky so we'd better address it early in the stream.
Bulk update of target milestone
Closing this as NOT_ECLIPSE since as per comment 22, the NPE problem was due to a bug in OpenSSH. As per comment 10, a workaround to alleviate the situation is avoiding opening too many channels in client code. I have opened bug 256581 in order to track the remaining performance issue. Once the implementation is changed as proposed in that bug, the problem should go away since we'll use way fewer channels and they will be more long-lived.