Bug 175300 - [performance] processes.shell.linux subsystem is slow over ssh
Summary: [performance] processes.shell.linux subsystem is slow over ssh
Status: ASSIGNED
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: PC Linux-GTK
: P2 major with 1 vote (vote)
Target Milestone: Future   Edit
Assignee: Anna Dushistova CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: performance
Depends on: 275060
Blocks:
  Show dependency tree
 
Reported: 2007-02-23 11:29 EST by Martin Oberhuber CLA
Modified: 2009-05-05 17:12 EDT (History)
3 users (show)

See Also:


Attachments
Thread dump of hanging Eclipse (26.29 KB, text/plain)
2009-03-12 06:25 EDT, Martin Oberhuber CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Oberhuber CLA 2007-02-23 11:29:41 EST
Connect Linux system type to build.eclipse.org, using ssh protocol only.

The Shell Processes subsystem works painfully slow.
It's faster when the shell is dstore, but still slow.

Supposedly, the problem is that for multiple commands a new shell instance is started every time. With ssh, this leads to reading the user's profile each time (whereas dstore doesn't do that).

Another possible optimization could be to try and do some filtering on the remote side already, e.g. for "My Processes" only transfer data according to the current user.


-----------Enter bugs above this line-----------
TM 2.0M5 Testing
installation : eclipse-platform-3.3M5 (I20070209-1006),
     update-site: cdt-4.0M5, emf-2.3M5, jdt-3.3M5
     update site RSE-I20070223-0730: rse-runtime-all, terminal, tests
                                     discovery, remotecdt, examples 
java.runtime : Sun 1.6.0-b105, mixed mode, sharing
os.name:     : Red Hat Enterprise Linux WS release 4 (Nahant Update 3)
uname        : Linux parser.takefive.co.at 2.6.9-34.EL #1 Fri Feb 24 16:44:51 
EST 2006 i686 athlon i386 GNU/Linux
------------------------------------------------
systemtype   : Linux-local / Windows-dstore (Daemon) / Unix-dstore (Running)
targetos1    : Windows XP SP1, Sun 1.4.2_13
targetos2    : Solaris-sparc 5.9, Sun 1.4.2_05
targetuname  : SunOS szg-anar 5.9 Generic_118558-06 sun4u sparc SUNW,Sun-Blade-1500
------------------------------------------------
Comment 1 Martin Oberhuber CLA 2007-02-23 11:31:06 EST
We should quickly improve the situation since I consider processes.shell.linux an enabling technology that can help many extenders get their work done.
Comment 2 Martin Oberhuber CLA 2007-04-02 06:44:40 EDT
Deferring non-API bugs to M7
Comment 3 Martin Oberhuber CLA 2007-05-10 05:40:01 EDT
Testing against dsdp.eclipse.org over ssh recently gave quite OK performance so reducing to P3 again.
Comment 4 Martin Oberhuber CLA 2008-09-23 11:58:31 EDT
Anna this could be something that you are interested in. The point is, that the processes.shell subsystem opens a new Shell channel very often. Performance could be improved if the same Shell channel is re-used.

RemoteShellCommandOperation is a class that could help in sending multiple commands in a row over the same shell channel, but still being able to see the output nicely separate (because it inserts begin-end-tags).
Comment 5 Anna Dushistova CLA 2008-09-26 16:05:36 EDT
(In reply to comment #4)
> Anna this could be something that you are interested in. The point is, that the
> processes.shell subsystem opens a new Shell channel very often. Performance
> could be improved if the same Shell channel is re-used.

I'm seeing very poor performance while getting the list of all processes. Seeing it happening with localhost is really annoying. 
But it looks like it's not because of opening channel too often.
It uses only one channel for this operation.
 

Comment 6 Martin Oberhuber CLA 2008-09-26 16:10:16 EDT
Is it also slow when you do this on a local shell:

   cat /proc/[0-9]*/status
Comment 7 Anna Dushistova CLA 2008-10-02 08:32:49 EDT
Now it even hangs for me.
Eclipse Version: 3.4.1
Build id: M20080911-1700

And then throws the exception:
com.jcraft.jsch.JSchException: channel is not opened.
	at com.jcraft.jsch.Channel.connect(Channel.java:187)
	at com.jcraft.jsch.Channel.connect(Channel.java:144)
	at org.eclipse.rse.internal.services.ssh.shell.SshHostShell.<init>(SshHostShell.java:112)
	at org.eclipse.rse.internal.services.ssh.shell.SshShellService.launchShell(SshShellService.java:52)
	at org.eclipse.rse.services.shells.AbstractShellService.launchShell(AbstractShellService.java:46)
	at org.eclipse.rse.internal.subsystems.processes.shell.linux.LinuxShellProcessService.listAllProcesses(LinuxShellProcessService.java:124)
	at org.eclipse.rse.subsystems.processes.servicesubsystem.ProcessServiceSubSystem.listAllProcesses(ProcessServiceSubSystem.java:127)
	at org.eclipse.rse.subsystems.processes.core.subsystem.impl.RemoteProcessSubSystemImpl.internalResolveFilterString(RemoteProcessSubSystemImpl.java:138)
	at org.eclipse.rse.core.subsystems.SubSystem.internalResolveFilterStrings(SubSystem.java:2820)
	at org.eclipse.rse.core.subsystems.SubSystem.resolveFilterStrings(SubSystem.java:2250)
	at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.internalGetChildren(SystemViewFilterReferenceAdapter.java:462)
	at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:281)
	at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:289)
	at org.eclipse.rse.ui.operations.SystemFetchOperation.execute(SystemFetchOperation.java:363)
	at org.eclipse.rse.ui.operations.SystemFetchOperation.run(SystemFetchOperation.java:141)
	at org.eclipse.rse.ui.view.AbstractSystemViewAdapter.fetchDeferredChildren(AbstractSystemViewAdapter.java:2300)
	at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:234)
	at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)


It all happens on localhost, so it's not a network problem.
Comment 8 Anna Dushistova CLA 2008-10-02 08:52:47 EDT
The same happens with files, but less often.
I can see the following in the log:
java.lang.NullPointerException
at org.eclipse.rse.internal.services.ssh.shell.SshHostShell.writeToShell(SshHostShell.java:181)
at org.eclipse.rse.internal.subsystems.processes.shell.linux.LinuxShellProcessService.listAllProcesses(LinuxShellProcessService.java:126)
at org.eclipse.rse.subsystems.processes.servicesubsystem.ProcessServiceSubSystem.listAllProcesses(ProcessServiceSubSystem.java:127)
at org.eclipse.rse.subsystems.processes.core.subsystem.impl.RemoteProcessSubSystemImpl.internalResolveFilterString(RemoteProcessSubSystemImpl.java:138)
at org.eclipse.rse.core.subsystems.SubSystem.internalResolveFilterStrings(SubSystem.java:2820)
at org.eclipse.rse.core.subsystems.SubSystem.resolveFilterStrings(SubSystem.java:2250)
at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.internalGetChildren(SystemViewFilterReferenceAdapter.java:462)
at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:281)
at org.eclipse.rse.internal.ui.view.SystemViewFilterReferenceAdapter.getChildren(SystemViewFilterReferenceAdapter.java:289)
at org.eclipse.rse.ui.operations.SystemFetchOperation.execute(SystemFetchOperation.java:363)
at org.eclipse.rse.ui.operations.SystemFetchOperation.run(SystemFetchOperation.java:141)
at org.eclipse.rse.ui.view.AbstractSystemViewAdapter.fetchDeferredChildren(AbstractSystemViewAdapter.java:2300)
at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:234)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:55)
Comment 9 Martin Oberhuber CLA 2008-10-02 10:14:38 EDT
Do you have many ssh connections to that host at the same time? Ther emight be a limitation on the SSH server with respect to the number of channels that it supports. Also, I have heard about a bug of sshd on Ubuntu that makes it not always close channels after disconnect -- which might lead to running out of resources eventually.

There are other bugs entered on bugzilla dealing with this, I can find the numbers if you are interested. doing

  ps -ef | grep ssh

on the server might also help understanding what's going on.
Comment 10 Anna Dushistova CLA 2008-10-08 12:14:04 EDT
(In reply to comment #9)
> Do you have many ssh connections to that host at the same time? Ther emight be
> a limitation on the SSH server with respect to the number of channels that it
> supports.

No, I had about 5 opened connections. I'm not able to reproduce the issue any longer with fresh installation of eclipse 3.4.1 + RSE 3.0.1.


Comment 11 Lothar Werzinger CLA 2009-02-16 12:07:13 EST
When I call
remoteProcessService.getRemoteProcessObject(shellPid);
the call lasts several seconds to complete.

I stepped into it with the debugger and it turns out that this call ends up 
calling

LinuxShellProcessService.listAllProcesses()

which in turn runs "cat /proc/[0-9]*/status" on the remote system and parses 
the output. I checked and the status for each process returns 34 lines that 
need to be parsed by listAllProcesses and that's where I think all the time 
is wasted, as it get's all (hundreds) of the processes on the remote machine 
when it's only interested in getting the one with the PID given in 
getRemoteProcessObject().
Comment 12 Martin Oberhuber CLA 2009-03-12 06:25:47 EDT
Created attachment 128534 [details]
Thread dump of hanging Eclipse

We have a report of "Connect" for a Linux systemType taking 5 - 10 minutes (!) and completely blocking Eclipse during that time.

Attached thread dump shows that the problem is calling into
	LinuxProcessHelper.populateUsernames(LinuxProcessHelper.java:117)
from
	LinuxShellProcessService.initService(LinuxShellProcessService.java:194)
which is called on the main Thread.

The Javadocs of IService#initService() say:
   "This method may be long-running, but is not yet expected to open a
    connection to a particular remote system.
   "

So I see two problems here: 
(1) if initService() may be long-running, why is it called on the main 
    Thread? This looks like an inconsistency in RSE Core. Not sure if we
    can change it though, we might have to update the Javadocs instead
    ("Specification update").
(2) if initService() is not yet expected to open a connection, why does it
    initialize user names already?
I think that the populateUsernames() must be deferred to when it is really needed.
Comment 13 Martin Oberhuber CLA 2009-03-12 06:27:34 EDT
Scheduling for our 3.1 "performance" milestone since we cannot accept a subsystem to bring all of Eclipse to a halt for 5 - 10 minutes. That's a major loss of functionality. The workaround -- not using the Shell Processes subsystem -- is not obvious and does mean losing the processes functionality.
Comment 14 Anna Dushistova CLA 2009-04-28 15:39:46 EDT
I've checked in deffered population of usernames. 
Comment 15 Martin Oberhuber CLA 2009-04-30 12:38:17 EDT
Hi Anna, is this complete with your checkin or what's still missing?
Comment 16 Anna Dushistova CLA 2009-04-30 12:44:59 EDT
(In reply to comment #15)
> Hi Anna, is this complete with your checkin or what's still missing?

Well, performance probably still could be improved, but right now it at least doesn't impact the "connect" process.

Comment 17 Martin Oberhuber CLA 2009-05-05 16:10:13 EDT
It looks like the issue from comment 12 is fixed.

It looks like more improvements may be possible by running some grep / awk / sed on the remote before parsing the output of
   cat /proc/[0-9]*/status

I would recommend this:
  1. Create a new bug blocking this one, specifically for "[performance] 
     deferred population of usernames" and mark it as fixed with 3.1m7
  2. If there is time, experiment how much more improvement would be possible
     by limiting data transfer by grepping on the remote.
Comment 18 Anna Dushistova CLA 2009-05-05 16:28:35 EDT
(In reply to comment #17)
> It looks like the issue from comment 12 is fixed.
> 
> It looks like more improvements may be possible by running some grep / awk /
> sed on the remote before parsing the output of
>    cat /proc/[0-9]*/status
> 
> I would recommend this:
>   1. Create a new bug blocking this one, specifically for "[performance] 
>      deferred population of usernames" and mark it as fixed with 3.1m7

Done.

>   2. If there is time, experiment how much more improvement would be possible
>      by limiting data transfer by grepping on the remote.
> 

Your idea makes us assume the existance of one of grep / awk / sed on remote side.

Comment 19 Martin Oberhuber CLA 2009-05-05 16:37:13 EDT
Yes... actually, you are the guys owning this, so I'll leave it to you whether you'll want to mark this fixed, or experiment with additional means of improving performance.

Not sure how much additional work makes sense at this point, given that TCF is coming...
Comment 20 Anna Dushistova CLA 2009-05-05 16:44:15 EDT
(In reply to comment #19)
> Yes... actually, you are the guys owning this, so I'll leave it to you whether
> you'll want to mark this fixed, or experiment with additional means of
> improving performance.

I really do not want additional dependencies on remote host for 3.1, since I can easily imagine systems without them.
 
> Not sure how much additional work makes sense at this point, given that TCF is
> coming...

Exactly.

Comment 21 Martin Oberhuber CLA 2009-05-05 17:05:48 EDT
So, if you are happy with performance in general, I recommend closing this FIXED -- if you plan any additional work, set target milestone Future -- if no more work is planned, close it as WONTFIX.