Community
Participate
Working Groups
Connect Linux system type to build.eclipse.org, using ssh protocol only. The Shell Processes subsystem works painfully slow. It's faster when the shell is dstore, but still slow. Supposedly, the problem is that for multiple commands a new shell instance is started every time. With ssh, this leads to reading the user's profile each time (whereas dstore doesn't do that). Another possible optimization could be to try and do some filtering on the remote side already, e.g. for "My Processes" only transfer data according to the current user. -----------Enter bugs above this line----------- TM 2.0M5 Testing installation : eclipse-platform-3.3M5 (I20070209-1006), update-site: cdt-4.0M5, emf-2.3M5, jdt-3.3M5 update site RSE-I20070223-0730: rse-runtime-all, terminal, tests discovery, remotecdt, examples java.runtime : Sun 1.6.0-b105, mixed mode, sharing os.name: : Red Hat Enterprise Linux WS release 4 (Nahant Update 3) uname : Linux parser.takefive.co.at 2.6.9-34.EL #1 Fri Feb 24 16:44:51 EST 2006 i686 athlon i386 GNU/Linux ------------------------------------------------ systemtype : Linux-local / Windows-dstore (Daemon) / Unix-dstore (Running) targetos1 : Windows XP SP1, Sun 1.4.2_13 targetos2 : Solaris-sparc 5.9, Sun 1.4.2_05 targetuname : SunOS szg-anar 5.9 Generic_118558-06 sun4u sparc SUNW,Sun-Blade-1500 ------------------------------------------------
We should quickly improve the situation since I consider processes.shell.linux an enabling technology that can help many extenders get their work done.
Deferring non-API bugs to M7
Testing against dsdp.eclipse.org over ssh recently gave quite OK performance so reducing to P3 again.
Anna this could be something that you are interested in. The point is, that the processes.shell subsystem opens a new Shell channel very often. Performance could be improved if the same Shell channel is re-used. RemoteShellCommandOperation is a class that could help in sending multiple commands in a row over the same shell channel, but still being able to see the output nicely separate (because it inserts begin-end-tags).
(In reply to comment #4) > Anna this could be something that you are interested in. The point is, that the > processes.shell subsystem opens a new Shell channel very often. Performance > could be improved if the same Shell channel is re-used. I'm seeing very poor performance while getting the list of all processes. Seeing it happening with localhost is really annoying. But it looks like it's not because of opening channel too often. It uses only one channel for this operation.
Is it also slow when you do this on a local shell: cat /proc/[0-9]*/status
Now it even hangs for me. Eclipse Version: 3.4.1 Build id: M20080911-1700 And then throws the exception: com.jcraft.jsch.JSchException: channel is not opened. at com.jcraft.jsch.Channel.connect(Channel.java:187) at com.jcraft.jsch.Channel.connect(Channel.java:144) at org.eclipse.rse.internal.services.ssh.shell.SshHostShell.<init>(SshHostShell.java:112) at org.eclipse.rse.internal.services.ssh.shell.SshShellService.launchShell(SshShellService.java:52) at org.eclipse.rse.services.shells.AbstractShellService.launchShell(AbstractShellService.java:46) at org.eclipse.rse.internal.subsystems.processes.shell.linux.LinuxShellProcessService.listAllProcesses(LinuxShellProcessService.java:124) at org.eclipse.rse.subsystems.processes.servicesubsystem.ProcessServiceSubSystem.listAllProcesses(ProcessServiceSubSystem.java:127) at org.eclipse.rse.subsystems.processes.core.subsystem.impl.RemoteProcessSubSystemImpl.internalResolveFilterString(RemoteProcessSubSystemImpl.java:138) at org.eclipse.rse.core.subsystems.SubSystem.internalResolveFilterStrings(SubSystem.java:2820) at org.eclipse.rse.core.subsystems.SubSystem.resolveFilterStrings(SubSystem.java:2250) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.internalGetChildren(SystemViewFilterReferenceAdapter.java:462) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:281) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:289) at org.eclipse.rse.ui.operations.SystemFetchOperation.execute(SystemFetchOperation.java:363) at org.eclipse.rse.ui.operations.SystemFetchOperation.run(SystemFetchOperation.java:141) at org.eclipse.rse.ui.view.AbstractSystemViewAdapter.fetchDeferredChildren(AbstractSystemViewAdapter.java:2300) at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:234) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55) It all happens on localhost, so it's not a network problem.
The same happens with files, but less often. I can see the following in the log: java.lang.NullPointerException at org.eclipse.rse.internal.services.ssh.shell.SshHostShell.writeToShell(SshHostShell.java:181) at org.eclipse.rse.internal.subsystems.processes.shell.linux.LinuxShellProcessService.listAllProcesses(LinuxShellProcessService.java:126) at org.eclipse.rse.subsystems.processes.servicesubsystem.ProcessServiceSubSystem.listAllProcesses(ProcessServiceSubSystem.java:127) at org.eclipse.rse.subsystems.processes.core.subsystem.impl.RemoteProcessSubSystemImpl.internalResolveFilterString(RemoteProcessSubSystemImpl.java:138) at org.eclipse.rse.core.subsystems.SubSystem.internalResolveFilterStrings(SubSystem.java:2820) at org.eclipse.rse.core.subsystems.SubSystem.resolveFilterStrings(SubSystem.java:2250) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.internalGetChildren(SystemViewFilterReferenceAdapter.java:462) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:281) at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:289) at org.eclipse.rse.ui.operations.SystemFetchOperation.execute(SystemFetchOperation.java:363) at org.eclipse.rse.ui.operations.SystemFetchOperation.run(SystemFetchOperation.java:141) at org.eclipse.rse.ui.view.AbstractSystemViewAdapter.fetchDeferredChildren(AbstractSystemViewAdapter.java:2300) at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:234) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Do you have many ssh connections to that host at the same time? Ther emight be a limitation on the SSH server with respect to the number of channels that it supports. Also, I have heard about a bug of sshd on Ubuntu that makes it not always close channels after disconnect -- which might lead to running out of resources eventually. There are other bugs entered on bugzilla dealing with this, I can find the numbers if you are interested. doing ps -ef | grep ssh on the server might also help understanding what's going on.
(In reply to comment #9) > Do you have many ssh connections to that host at the same time? Ther emight be > a limitation on the SSH server with respect to the number of channels that it > supports. No, I had about 5 opened connections. I'm not able to reproduce the issue any longer with fresh installation of eclipse 3.4.1 + RSE 3.0.1.
When I call remoteProcessService.getRemoteProcessObject(shellPid); the call lasts several seconds to complete. I stepped into it with the debugger and it turns out that this call ends up calling LinuxShellProcessService.listAllProcesses() which in turn runs "cat /proc/[0-9]*/status" on the remote system and parses the output. I checked and the status for each process returns 34 lines that need to be parsed by listAllProcesses and that's where I think all the time is wasted, as it get's all (hundreds) of the processes on the remote machine when it's only interested in getting the one with the PID given in getRemoteProcessObject().
Created attachment 128534 [details] Thread dump of hanging Eclipse We have a report of "Connect" for a Linux systemType taking 5 - 10 minutes (!) and completely blocking Eclipse during that time. Attached thread dump shows that the problem is calling into LinuxProcessHelper.populateUsernames(LinuxProcessHelper.java:117) from LinuxShellProcessService.initService(LinuxShellProcessService.java:194) which is called on the main Thread. The Javadocs of IService#initService() say: "This method may be long-running, but is not yet expected to open a connection to a particular remote system. " So I see two problems here: (1) if initService() may be long-running, why is it called on the main Thread? This looks like an inconsistency in RSE Core. Not sure if we can change it though, we might have to update the Javadocs instead ("Specification update"). (2) if initService() is not yet expected to open a connection, why does it initialize user names already? I think that the populateUsernames() must be deferred to when it is really needed.
Scheduling for our 3.1 "performance" milestone since we cannot accept a subsystem to bring all of Eclipse to a halt for 5 - 10 minutes. That's a major loss of functionality. The workaround -- not using the Shell Processes subsystem -- is not obvious and does mean losing the processes functionality.
I've checked in deffered population of usernames.
Hi Anna, is this complete with your checkin or what's still missing?
(In reply to comment #15) > Hi Anna, is this complete with your checkin or what's still missing? Well, performance probably still could be improved, but right now it at least doesn't impact the "connect" process.
It looks like the issue from comment 12 is fixed. It looks like more improvements may be possible by running some grep / awk / sed on the remote before parsing the output of cat /proc/[0-9]*/status I would recommend this: 1. Create a new bug blocking this one, specifically for "[performance] deferred population of usernames" and mark it as fixed with 3.1m7 2. If there is time, experiment how much more improvement would be possible by limiting data transfer by grepping on the remote.
(In reply to comment #17) > It looks like the issue from comment 12 is fixed. > > It looks like more improvements may be possible by running some grep / awk / > sed on the remote before parsing the output of > cat /proc/[0-9]*/status > > I would recommend this: > 1. Create a new bug blocking this one, specifically for "[performance] > deferred population of usernames" and mark it as fixed with 3.1m7 Done. > 2. If there is time, experiment how much more improvement would be possible > by limiting data transfer by grepping on the remote. > Your idea makes us assume the existance of one of grep / awk / sed on remote side.
Yes... actually, you are the guys owning this, so I'll leave it to you whether you'll want to mark this fixed, or experiment with additional means of improving performance. Not sure how much additional work makes sense at this point, given that TCF is coming...
(In reply to comment #19) > Yes... actually, you are the guys owning this, so I'll leave it to you whether > you'll want to mark this fixed, or experiment with additional means of > improving performance. I really do not want additional dependencies on remote host for 3.1, since I can easily imagine systems without them. > Not sure how much additional work makes sense at this point, given that TCF is > coming... Exactly.
So, if you are happy with performance in general, I recommend closing this FIXED -- if you plan any additional work, set target milestone Future -- if no more work is planned, close it as WONTFIX.