159164 – Execute a single shell command on multiple servers

Bug 159164 - Execute a single shell command on multiple servers

Summary: Execute a single shell command on multiple servers

Status:	NEW

Alias:	None

Product:	Target Management
Classification:	Tools
Component:	RSE (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P3 enhancement (vote)
Target Milestone:	Future
Assignee:	dsdp.tm.rse-inbox
QA Contact:	Martin Oberhuber

URL:	http://www.sematopia.com
Whiteboard:
Keywords:	helpwanted

Depends on:
Blocks:

Reported:	2006-09-28 13:59 EDT by George A. Papayiannis
Modified:	2007-05-10 05:32 EDT (History)
CC List:	2 users (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description George A. Papayiannis

2006-09-28 13:59:24 EDT

I'm not sure if this feature request is inline with the vision of the project, but I thought to post it anyways.

In cluster management, its common to need to perform the same command on x number of servers.  It would be nice, if I could open a single shell, which would execute the commands simultaneously on all servers selected.  If any errors appeared on an individual server, then they would come back to the single shell with a specific error (including which server caused the problem).

Comment 1 David Dykstal

2006-09-28 14:43:09 EDT

I don't think the existing shell service is quite up to this kind of thing. We are also exploring a terminal service which should do what you want (at least for a single target).

I'm not sure how one could handle output from several targets delivered from a single shell. Any suggestions?

Comment 2 Martin Oberhuber

2006-09-29 09:54:26 EDT

This request makes absolutely sense, and I do think this is in the scope of the project - as part of the multi-core / multi-system / connection-group initiatives.
(See also http://wiki.eclipse.org/index.php/DSDP/TM/Connection_Groups)

We could definitely need help in many areas around this multi-system initiative, so could you imagine joining us - as code contributor, tester or simply reviewer of specs and requirements?

I imagine that in order to send a command to multiple systems simultaneously, and interpret the output properly, a special view might be needed. I could imagine something similar to the JUnit Tests View in Eclipse Platform, which would display the commands being sent to the hosts as a tree, and mark those hosts that fail with a red X; when clicking on any node, the output from that node could be displayed.

The only thing I'm wondering (with respect to cluster management) is if there are no batch processing or queueing systems that handle such distribution of jobs already? My feeling is that it would be weird if RSE tried to attach such a problem on "bare metal" hardware without using batch/parallel processing middleware. 

The Eclipse PTP Project does have an integration with some batch processing systems already, so there might be a chance for collaboration.

Comment 3 George A. Papayiannis

2006-09-29 11:30:59 EDT

Definitely some decent thought needs to be put into making a solution that can scale to a large number of targets in a given cluster.

On a small scale, there could be a separate view, which has a command line input box, and a list of targets. The user would select the targets they want (say x Linux boxes), then begin typing commands. On the first command, RSE would open x new shells, which would all be visible to the user. As the user typed into that command line input box (from within the new view) the input typed would get echoed to each new shell view. When the command is submitted, it would be submitted on all the shells. This way the outputs from each target would get sent back to their respective open shells.

This solution doesn't scale well, but it’s the most reliable. In many cases someone may be managing 20 or more servers per cluster. Having 20 shells open would be difficult to manage.

Martin, you had an interesting suggestion, and we could probably build off that. The bigger question is what defines a failure? The commands will generally execute, instead the outputs returned will determine if the action was a success or not.

There are definitely batch processing/queuing systems offered by WebSphere and other big names. But many companies run custom made clusters (Linux boxes, etc.) which don't have any type of system like this. An option on Linux is ClusterSSH (http://clusterssh.sourceforge.net/index.php/Main_Page), it does a similar job as I described above.

My time is limited right now at IBM, but I’d love to join in some capacity. Perhaps to review designs, make suggestions, test, bug fixes, and if time permits more involved coding.

Comment 4 Greg Watson

2006-09-30 07:41:49 EDT

Before doing this, you (TM) should really think about what it is you're trying to achieve. There are *many* clustering systems (Rocks, Oscar, bproc, XCAT, just to name a few) that do this kind of thing, and I would strongly advise against re-implementing this yet again unless you're going to do it a whole lot better.

Maybe a better strategy would be some kind of high level API that interfaces to an arbitrary cluster system (which incidentally, is exacly what PTP does). If you try to implement this yourself, I guarantee on day one someone will try it on a 1000 node cluster and complain about your lack of scalability...

Comment 5 George A. Papayiannis

2006-09-30 20:59:21 EDT

I wasn't suggesting RSE to replace Rocks Cluster Distribution, etc.  Clusters like this require a lot more control than simply executing commands -- They need the ability to perform mass installations, rollbacks, etc.. all through a unified OS.  Furthermore Rocks is a Linux OS distro, RSE would never be able to compare to something with kernel level support.  But I do agree, if we do go ahead with this, a specific scope needs to be defined.  Furthermore, users would need to understand the limitations, like using xxx nodes would have scaling issues, etc.

Comment 6 Martin Oberhuber

2006-10-07 15:54:25 EDT

Consider for 2.0

Comment 7 Martin Oberhuber

2006-11-10 18:06:07 EST

Need to investigate - Implementation not planned yet.

Comment 8 Martin Oberhuber

2007-04-17 12:42:06 EDT

Note that a very simple sample implementation of a Multishell (i.e. executing one command on multiple remote servers) is now available as part of the examples we used in the TM Tutorial at EclipseCon 2007.

All material is available from
http://www.eclipsecon.org/2007/index.php?page=sub/&id=3651
and works with TM 2.0M6.

I'd be interested in getting feedback whether this is along the lines you had in mind. We don't currently plan to make this part of TM/RSE 2.0 since it is much too shaky still. But it may remain as an example and perhaps go into the next TM release. Having somebody (from the community) actively owning this and improving it would help.