Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous

Hi Martin,

Oberhuber, Martin wrote:
Hi Scott,

good points, indeed! thanks for taking the time to write
such an elaborate reply.

When blocking the calling thread (e.g. any synchronous reads/writes) results in system-wide 'problems' (e.g. UI is stopped, other server

Hm... IMHO this is not a use-case which requires async
because it couldn't get implemented with synchronous
calls. This just shows that somebody's using a synchronous
API in a way that's inappropriate for slow/unreliable
back-ends.

Yes...I guess the point is that any network is a relatively slow/unreliable backend compared to any disk.

This does point out an important truth, though:

synchronous APIs may *encourage* usage of background Jobs
for slow operations, but cannot enforce this. Asynchronous
APIs, on the other hand, *force* the client to take actions
which are appropriate for use with slow/unreliable back-ends.

True...because the default assumption for the network is that it is relatively slow and unreliable.

>From that point of view, it might actually make sense to
have the "true E4 resources kernel" only support async file system access, and the backward compatibility wrappers provide a bridge to synchronous access... that way we could force "true E4" clients to take appropriate measures. Given that ECF filetransfer is in Equinox already, I could imagine
getting rid of EFS and replacing it by ECF filetransfer
(probably extended) in the "core E4 Resources".

This seems too extreme to me. That is, EFS is an established, very nice synchronous file system API. No reason to 'get rid' of it for technical purity IMHO (i.e. everything must be asynchronous over network). Rather it seems to me that having the ability to go between synchronous and asynchronous is a way to go...while introducing mixed strategies (like Hadoop-based EFS impls, which asynchronously replicate files/file blocks).


Futures as return value might be a concept that allows using asynchronous APIs with minimal extra effort when
results are available "very fast and reliably".

I agree that futures (we have the class name 'AsynchResult'...the 'h' is embarrasing for me) can be a very useful concept for bridging asynchronous calls with with synchronous needs (BTW, we use AsynchResult to get JRE 1.4 compatibility...the 1.5+ concurrent API also has futures of course). But they are (still) a relatively foreign API concept...that is, not too familiar for many programmers. Still, I think they are useful.


Writing an EFS wrapper to ECF filetransfer for backward
compatibility should be an easy thing to do (and probably
you have done it already). In terms of the resource layer,
EFS is pretty separated from it already (only connected
by URI on the API). Having the Resources layer directly
make asynchronous calls (instead of using the EFS wrapper)
should be a very interesting experiment.

Well, no we haven't done this already, although we have done the reverse (implement async ECF filetransfer on top of EFS+jobs). It might be a useful exercise, but it seems to me like reusing more complete replication approaches (i.e. Hadoop, etc) for implementing EFS on top of asynchronous would be quicker and easier.

Well, if such an adapter is not available then they could do it synchronously rather than asynchronously.

But that's exactly my point: we don't want clients having
to write code for both synchronous and asynchronous variants.
That's code duplication, resulting in bloat. I'd like to shoot for ONE core e4 api for each concept (with additional
compatibility layers for backward compatibility where needed).

Although I share your desire to reduce bloat, I'm not sure that having either synchronous xor asynchronous access to resources (whether remote or local) is the natural way to keep bloat to a minimum for access to filesystem/resources.

By "adding async to the EFS API" I didn't think about any
technical measure such as blowing up the IFileStore interface.
What I meant was, that clients should be able to expect any
contributed file system to be accessible with all the API that E4 resources FS exposes -- be it synchronous or asynchronous, via 1 or multiple interfaces, obtained via adapter pattern or otherwise.

It seems to me this is more a requirement on file system implementer...i.e. that they implement all resources API (i.e. both sync/async)...right?

Although I think this is a good general principal (implementers should implement entire relevant API), in practice I'm not sure how to require it given a provider architecture (for EFS and for ECF). That is, I'm sure that there will be incomplete EFS implementations, incomplete ECF file transfer implementations, etc. Encouraging completeness will be easy...requiring it will be hard I expect.


<stuff deleted>
I disagree. I think the problem is with trying to make local and remote access look exactly the same (network transparency)

Hm... on the other hand, a client that is prepared to deal with remote files should easily be able to handle the local
case as well, no? I'd like to investigate technical measures
of how we can make it simple to program the remote case.

Yes, I agree that it should be easy to handle both the local and remote cases...but that's the hard part...since the local and remote cases are different...in performance, reliability, partial failure, etc. and as the Note on Distributed Computing points out...these are differences that are very hard to create a uniform API for...because the differences in network and local behavior frequently 'bubble up' to the API.

But I do think that there is a lot of room for innovation...particularly around replication/caching/synchronization for file systems (e.g. Hadoop).
If the core framework is remote-aware we can add layers for
simplified access if we want. We cannot do it the other way
round.

True.

Can anybody argue against using the asynchronous ECF filetransfer APIs as the core E4 resources file system
layer?

Yes, I can (surprise :). I think introducing ECF/asynchronous for local file system access would be a waste of time. Even though it would be easily done (ECF's file transfer API has asynch access to the local file system already), I don't think it would be worth doing.

Although I'm not sure what the best way to 'bridge' EFS and the ECF file transfer APIs is (i.e. adapters, etc), I don't think it's really necessary or desirable to strictly layer them. An example of this is p2's usage of ECF...it only uses the file retrieval part of the ECF filetransfer API (it has no use for upload, or directory navigation). It's actually simpler and a better fit to just use that part (retrieval) of ECF filetransfer...and not have to deal with other dependencies that would be implied by including, say, all of EFS (with or without ECF underneath).

I understand (and fully appreciate) the desire to reduce API bloat (i.e. client code duplication, multiple APIs, etc), but I'm not sure of the best way to do that when it comes to synchronous/asynchronous (or local/network rather) access to filesystems.
Scott




Back to the top