Bug 216763 - [api] Support server to server copy
Summary: [api] Support server to server copy
Status: NEW
Alias: None
Product: Target Management
Classification: Tools
Component: RSE (show other bugs)
Version: 2.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: Future   Edit
Assignee: David McKnight CLA
QA Contact: Martin Oberhuber CLA
URL:
Whiteboard:
Keywords: api, helpwanted
Depends on:
Blocks:
 
Reported: 2008-01-28 08:58 EST by Christian Hohmann CLA
Modified: 2008-07-04 15:43 EDT (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Christian Hohmann CLA 2008-01-28 08:58:18 EST
Task: copy files from one hostConnection to an other hostConnection

Situation now: 
1. download files to local machine from sourceHost 
2. upload files to targetHost from local machine

Some file services support direct Server-to-Server copy (for example FTP) between the same subsystemType and between different subsystemTypes.

Desireable solution:
If sourceConnection and targetConnection support direct copy between each other: Request the sourceHost to copy the specified files to the targetHost directly without stressing the client host.

This would at least double the speed of the transaction, halve the network load and through this offer the user a more comfortable handling of the RSE due to minimization of the workload on client machine.
Comment 1 Martin Oberhuber CLA 2008-02-05 12:13:20 EST
DaveM what would be needed to support direct Server-to-server copy between identical IRemoteFileSubSystem instances, in terms of API?
Comment 2 David McKnight CLA 2008-02-05 15:28:16 EST
It should be noted that, in some cases, server-to-server via the current configuration would be just as fast as the proposal.  Consider the following situations:

There are 3 servers (A,B and C) and one client (D)
-a user copes a file x from A to B
step 1) RSE copies x from A to D, creating x' on D
step 2) RSE copies x' from D to B, creating x on B
-this results in 2 file transfers

However, now the user wants to copy file x from A to C
step 1) RSE sees that x' already exists on D so no need to copy x from A
step 2) RSE copies x' from D to C, creating x on C
-this results in 1 file transfer

(In reply to comment #1)
> DaveM what would be needed to support direct Server-to-server copy between
> identical IRemoteFileSubSystem instances, in terms of API?

The ability to support this would depend on the type of service used on the server side.  It gets more complicated when different types of services are used on either end, but for this discussion I'll limit this to the case where the services are the same.

One question is whether to introduce new API into IFileService or to use an adapter kind of mechanism like we did for permissions.  For simplicity, I'll just comment on the case where we add to IFileService.

I guess one starting point would be for each IFileService to have an API indicating whether direct cross-system file transfer is supported.  

public boolean supportsCrossSystemFileTransfer()

For the DStore implementation, I could imagine introducing a command that passes the required information for one server to create a new socket connection with another server for the purpose of transfering files, although it would require a little bit of work to do it elegantly.  For FTP (and I imagine sFTP), I guess this ability is already available and it would just be a matter of calling the right APIs from the services.  Regardless of the implementation, we would need a new transfer API similar to this:

public boolean transferFile(String srcParent, String srcName, String tgtSystem, String tgtParent, String tgtName)

We would probably need more than just the tgtSystem (for example the port) so we may need to introduce a new construct representing a host since IHost is too much for the service layer (too many dependencies).

Since this is an IFileService API, we may need to provide the same APIs at the service subsystem layer so that the subsystem isn't bypassed in doing these operations.  The actual calls to the API would probably be made from SystemViewRemoteFileAdapter.doDrop() when the user does this via copy/paste and drag and drop.




  
 


 
Comment 3 Martin Oberhuber CLA 2008-02-05 15:52:23 EST
Thanks Dave.

I think my biggest concern is authentication: When server A is supposed to directly transfer data to server B, it needs to log in from A into B with some credentials. 
We'd potentially need to transfer username and password from A to B, and potentially have access to other credentials / login procedures which really reside in the RSE ConnectorService today, and not in the service.

Like with the problem of properly transporting IHost information into the service, we'd have a similar problem with the credentials. I cannot see a single API that's simple enough for the service layer, yet generic enough to support all kinds of services.

On the other hand, assuming that exactly the same set of credentials would work for connecting A to B, like client to A, the credentials could perhaps be managed transparently inside the service... 

Perhaps what we need is a way for the ConnectorService (which has the IHost) to package the address information into a simpler interface that's understandable by the IFileService as well. Then, the IHostInfo could transparently carry address, credentials, and anything else that's needed for connecting in a transparent way. Concrete implementations would extend it as FTPHostInfo, SSHHostInfo, DStoreHostInfo etc. but only if they need to have server-to-server-copy:

/** A simple interface that describes the address of a host. */
interface IHostInfo {
}

interface IConnectorService {
   public IHostInfo getHostInfo(IHost host);
}

interface IFileService {
   public boolean copyTo(IHostInfo otherHost, ...)
}
Comment 4 David McKnight CLA 2008-02-05 16:02:28 EST
(In reply to comment #3)
> Thanks Dave.
> I think my biggest concern is authentication: When server A is supposed to
> directly transfer data to server B, it needs to log in from A into B with some
> credentials. 
> We'd potentially need to transfer username and password from A to B, and
> potentially have access to other credentials / login procedures which really
> reside in the RSE ConnectorService today, and not in the service.
> Like with the problem of properly transporting IHost information into the
> service, we'd have a similar problem with the credentials. I cannot see a
> single API that's simple enough for the service layer, yet generic enough to
> support all kinds of services.
> On the other hand, assuming that exactly the same set of credentials would work
> for connecting A to B, like client to A, the credentials could perhaps be
> managed transparently inside the service... 
> Perhaps what we need is a way for the ConnectorService (which has the IHost) to
> package the address information into a simpler interface that's understandable
> by the IFileService as well. Then, the IHostInfo could transparently carry
> address, credentials, and anything else that's needed for connecting in a
> transparent way. Concrete implementations would extend it as FTPHostInfo,
> SSHHostInfo, DStoreHostInfo etc. but only if they need to have
> server-to-server-copy:
> /** A simple interface that describes the address of a host. */
> interface IHostInfo {
> }
> interface IConnectorService {
>    public IHostInfo getHostInfo(IHost host);
> }
> interface IFileService {
>    public boolean copyTo(IHostInfo otherHost, ...)
> }

Yeah, the host info along with credentials was what I had in mind.  I like the idea of retrieving the necessary information via the connector service.

 
Comment 5 Christian Hohmann CLA 2008-02-21 04:46:10 EST
(In reply to comment #2)David McKnight
Great that you think about how to implement this feature.

David,
you wrote that the copy from Server A to B and later from A to C needs also to transport 4 Times of data via the network. You are right.
But imagine the following server architecture:
Servers are in a high-performance network connected and the rest of the clients are connected via "normal LAN" or even via Internet. 
An other point is that the files or the ammount of data could be very high (lets say some GBs). So the normal office PC or notebook need to hadle the filetransfer over its comparable poor resources. 

I think the main goal is not only the saving of time and bandwith (what is not possible in all cases) but to free the client from the real file-transfer-work. The RSE allowes you to access 2 Servers in city A while you and your RSE are sitting in city B. When you need to copy the files between the two servers it would be better if the data doesn't need to be copied to your local machine and afterwards back to the target server.

I think you are right, it might be good to limit this discussion to two subsystems from the same subsystemtype.


(In reply to comment #3)Martin Oberhuber

Martin,
yes, authentication is a big concern. For my specific subsystemtype, it might be much easier. I don't use username/pwd for authentication but instead certificates. They are managed outside the rse and used when sending commands to the server. The Server to server copy uses trust-delegation. The source-server adds the copy-request of the client (which is signed by certificate) to the request that it send to the target-server. By using this, the second server accepts the request from the source-server due to it is authorized by the user-certificate.

I think authentication in this case is very dependent on the service implementation. The initiation of the server-to-server-copy should be the same in all implementations, call the source-server to copy to the destination-server. 
Comment 6 Martin Oberhuber CLA 2008-02-21 05:01:50 EST
Good points. You seem to be talking about some high performance computing centers. Have you had a look at the g-eclipse project yet? They provide a frontend to grid middleware for such distributed high performance compute centers, where it is normal to have certificates, trust delegation and server-to-server copy.

Only point is that you need to have the proper Grid middleware installed.
http://www.eclipse.org/geclipse/
Comment 7 Christian Hohmann CLA 2008-02-21 10:43:39 EST
You are right, in deed I worked the RSE Plugin into a clientsoftware of a grid mitddleware.

>...You seem to be talking about some high performance computing
>centers...

Yes, there are high-speed network connectivities. But also in "normal" companies (big or smaller ones) for which I worked the servers are often connected i.e. via fiber cable and the normal workstations only via 100Mbit Lan. In Addition servers are mostly much more powerful than simple workstations (and can handle bigger file transport much better without eating ressources on a laptop). 

The difference between a HPC machine and a PC is very high, but the difference between a fileServer and a normal workstation in a regular company is also not so small.
And I guess a lot of Administrators use RSE for administration (well, I would do so.)


I still think it would be an interesting extension not only for my specific need but also for regular use of RSE for administrate remote systems.

Comment 8 Martin Oberhuber CLA 2008-02-21 15:36:30 EST
Well, again, the crack of this will be that server A needs to authenticate properly against server B, without compromising security -- transmitting or storing passwords in plaintext should be avoided, just like storing certificates or private keys on the remote should be avoided.

Requiring the same subsystemType for source and target is a good start.

In the end, some of the client functionality needs to be put on the server, thus requiring a new (different) kind of transmission component. For SSH, for instance, rather than performing Java ChannelSftp.* API methods, the remote-to-remote transfer needs to issue scp or sftp commands in a remote shell. Not sure what the situation would be like for dstore.

Let's see where we can get from here. Suggestions and contributions are welcome. Getting one sample implementation to work on one kind of subsystem might help us getting an idea of what API changes might be required.
Comment 9 Martin Oberhuber CLA 2008-02-21 15:42:49 EST
Actually, an FTP implementation might be interesting since I'm told that some FTP servers support server-to-server-copy by means of the standard protocol.
Comment 10 Christian Hohmann CLA 2008-02-25 06:43:05 EST
Hi,

I looked about the FTP protocol and the server-to-server copy there (FXP), especially how the authentication is done. I found out, that the server to server copy on ftp is realized as it follows:

* Client connects to target-server.
* Client sends PASV to target-server and gets back IP and Port for a Connection.
* Client connects to source-server.
* Client sends PORT (with received IP and Port of target-Server) to source-server.
* Client specifies with RETR on source-server the desired files to transfer.
* Client sends STOR to target-server to command it to accept the filetransfer.

The benefit is, that the ftp-servers don't need to care about authenticating each other, instead just react direct to the given orders of the already authenticated client. So the client only needs to log in on both servers as usual and send the commands. Obligate is, that both servers allow server-to-server copy, but this is obligate anyway to support this feature.
Comment 11 Martin Oberhuber CLA 2008-02-25 06:52:01 EST
Thanks Christian, this is interesting to learn. While FTP doesn't transfer encrypted data, I'd like to note that the scheme you have outlined would support safe encryption if the client sends a one-time-passphrase for server1 to encode  (with 3des for instance) and server2 to decode. So not even public key authentication would be needed.

This scheme wouldn't work for fixed protocols like SSH, but it would be interesting to consider it for dstore, for instance.

As I have mentioned before, my personal recommendation is to go with an FTP implementation first, where the protocol is already specified, such that we see what additional APIs are needed.

For FTP, encryption could be added with a "dstore supertransfer" kind of solution, where a shell connection on the source is used to first pack all the data into an encrypted ZIP, then transfer the data, then use a shell connection on the destination to unpack everything out of the encrypted ZIP.

NB: Requesting such "pluggable and encryptable supertranser" for any RSE connection that has both a files and a shell subsystem would be another interesting enhancement request.
Comment 12 Christian Hohmann CLA 2008-02-25 08:36:58 EST
> So not even public key authentication would be needed.

Yes, you are right. The feature should be realizeable by using the current authentication mechanism.

The additional encryption would be a nice feature for server-to-server communication and also for client-to-server communication. (Especially for FTP, that is unencrypted).