Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous

From: Scott Lewis <slewis@xxxxxxxxxxxxx>
Date: Mon, 20 Oct 2008 12:32:28 -0700
Delivered-to: eclipse-incubator-e4-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/eclipse-incubator-e4-dev>
List-help: <mailto:eclipse-incubator-e4-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev>, <mailto:eclipse-incubator-e4-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev>, <mailto:eclipse-incubator-e4-dev-request@eclipse.org?subject=unsubscribe>
User-agent: Thunderbird 2.0.0.17 (Windows/20080914)

Hi Martin,

Oberhuber, Martin wrote:

<stuff deleted>

Could you cite a use case where async access is necessary?

When blocking the calling thread (e.g. any synchronous reads/writes)results in system-wide 'problems' (e.g. UI is stopped, other serverprocessing is blocked, etc).

I think that (assuming all synchronous methods have progressmonitors for cancellation, which is the case in EFS), theonly difference between sync and async access is(1) the number of Threads in "wait" state,
  (2) locking of resources while Threads synchronously wait,
  (3) potential for coalescing multiple requests to the
      same item in the case of asynchronous queries.

I would also say that the sync access is not 'predictable' in a way thatcan result in problems...particularly when the network is involved. Forexample, when calling a blocking file.read, if that read is on the localfilesystem with modern OSes it's quite likely to eventuallycomplete....and in a timely manner. But when that same read is doneover the network it's much more likely to block for a very long time(e.g. due to variable network performance), or simply never return(block forever).

In the asynchronous case, no Threads are waiting and resources
*may* be unlocked until the callback returns, but this unlocking
of resources needs to be carefully considered in each case.Does the system always remain in a consistent state? RESTfulsystems ensure this by placing all state info right into therequest, which is a great idea but likely not always possible.It's not only a matter of the API being complex or not. The factis that the concept of being asynchronous as such is more flexible,
but also requires adopters to be more careful, or at least think
along different lines.
I also think that we should look into the need for beingasynchronous or not separately for the kinds of requests:
  (A) Directory retrieval (aka childNames())
  (B) Full file tree retrieval
  (C) Status/Attribute retrieval for an individual file
  (D) File contents retrieval
For (D) we already use Streams in EFS, which can beimplemented in an asynchronous manner. What's currentlymissing in EFS is the ability to perform random access,like the JSR 203 SeekableByteChannel [1]. Interestingly, nio2has both a synchronous FileChannel [2] andAsynchronousFileChannel [3].
For (A), (B), (C) I'm not sure how much we would win from
an asynchronous variant, since I'd assume that not much
work could be done (and not much resources freed) while
asynchronously waiting for their result anyways. But perhaps
I'm wrong?

I do think this assumption is wrong (not being able to do much whilewaiting for results). That is, A-C are all accessing file 'meta-data'(directories, trees, file attributes, etc). This meta-data *can* bevery large (e.g. for large directory trees, lots of data for each file,etc). So although the file content is often much larger than themeta-data, that's not always the case. So blocking on these operations(meta-data access...particularly over the net) can be an issue aswell...again depending upon the application/system performance requirements.

So I think the best evidence for the need for asynchronous access tofile systems is simply the persistent availability of asynchronousapproaches...even to local file systems. I agree completely that ingeneral synchronous i/o (local and/or network) is easier toprogram...and in the environments where performance and reliability arehigh (e.g. accessing a local physical disk) then for the most partsynchronous i/o can be used. But there are app-level and/orsystem-level requirements that are more amenable to asynchronous i/oapproaches, and so those APIs are available...even in the case of local(non-network) i/o.

3) Using (e.g.) adapters it's not necessary to force such an API onanyone (rather it can be available when needed)
Hm... so, let's assume that client X wants to do somethingasynchronous. So it does
   myFileStore.getAdapter(IAsyncFileStore.class);
some file systems would provide that adapter, others not.
What's the client's fallback strategy in case the asyncadapter is not available?

Well, if such an adapter is not available then they could do itsynchronously rather than asynchronously.

I'm afraid that if we use such adapters, we end up with the
same code in clients again and again, because they need some
fallbacks strategy. It seems wiser to place the fallbackstrategy right into the EFS provider, since it is alwayspossible to write a bridge between a synchronous and an
asynchronous API in a single, generic way.

If the client uses synchronous i/o (perhaps in a new thread) as afallback strategy then isn't the EFS (synchronous) already built intothe EFS provider?

Therefore, I'm more in favor of determining what APIs we want
to be asynchronous, and just adding them to EFS.

Huh? It seems to me that for simplicity it would make more sense toprovide new interfaces (possibly as adapters, but not necessarily)rather than put a bunch more methods on (e.g.) IFileStore.

The adapter
idea could be used for adding provisional API, but the final
API should not need that.

Seems to me that this would result in a very large and complicated API,that would include both synchronous and asynchronous calls...meaning that

1) clients would have tease apart which methods are which making itharder to use the API(s)2) it would require providers to implement both synchronous andasynchronous operations...thus also making it harder to implement evensimple EFS providers.Why not provide some separation of concerns through whatever means (e.g.different packages...e.g. java.io, java.nio, adapters, others)?

To that extent, let's start assuming that files are quick
and local. And
let's investigate how we could leverage ECF to support remote file
systems. If that doesn't meet our needs, we can always add
async later.
I'm not sure if this is a good strategy. It seems to lead
towards more and more separation of local vs. remote --which, I think, leads to either duplication of code in theend, or non-uniform workflows for end users.

I disagree. I think the problem is with trying to make local and remoteaccess look exactly the same (network transparency) to allapplications. Sometimes/many times applications care that resources areremote, take longer to load, are more likely to go away and not return,etc., etc. I think the workflows that are most problematic are thosethat assume that local resource access is exactly the same as diskaccess. As much as I would like to have network and local be exactlyalike in terms of performance and reliability they are not.

Let me draw some sceanrio of what the world could look likein 10 years: with the Internet getting more and more intoour lives, you'd want to use an Eclipse based product todive into some code base that you just found on the net.
Without downloading everything in advance. Or you browse into
some mp3 music store. Add some remotely hosted Open SourceLibrary to your UML drawing just by drag and drop.

Sounds nice...but I'll guarantee you that in 10 years the Internet willstill be much slower than local disc access, and much less reliable thandisk access...and that applications (and users) will notice.

I think that users will more and more want to operate on
remote networked resources just the same as on localresources.

I don't deny that people (particularly programmers) would like tooperate on remote networked resources in the same way as localresources...I certainly would. But I don't think that the differencesbetween local system and network systems can be made to entirelydisappear. Many have tried...all have failed as far as I'm concerned :).I don't have any problems with a virtual file system, but I think it's abig step to assume that such an abstraction alone can deal with all theissues of networked file systems. There's a good reason why NFS (e.g.)isn't used over the WAN...and probably never will be. Rather, we getapplications like the web browser to deal with remote resources.

E4 gives us the chance to try and come up with
models that support such workflows in a uniform way. Let'snot throw away that chance prematurely.

It doesn't seem to me that E4 is likely to deal with issues that affectall distributed systems...e.g. differences in performance, reliability,partial failure. If it is trying to do that, it seems to me that it'sbiting off more than it should chew.

I agree that we need to start on concrete work items
rather than endlessly discussing concepts. But as we
start on these work items, let's keep the concept that
things may be remote in our minds.

There's an interesting discussion of strategies for dealing with networkissues in a 'uniform' in terms of API in the Note on DistributedComputing paper:


http://research.sun.com/techrep/1994/abstract-29.html

They sort of ask the question: should API be designed with localsystems in mind or remote systems in mind? There are major difficultieswith both extremes...because assuming local (the natural firstassumption) essentially ignores the differences in the net WRTperformance, partial failure, etc and results in API that doesn't workwhen moved to the net.On the other hand, designing APIs as if everything were remote is alsoproblematic because then the programmer has to deal with lots of failurecases, asynchrony, etc that makes things much harder for even the simplecases (local, fast, reliable access).

Sounds reasonable. Just as an aside: I think there's a lotof potential to use asynchronous file transfer + replication
to do caching of remote resources.
That's a great approach, especially if it works on thefile block level (such that random access to huge remotefiles can be cached). Again, one thing that's missing from EFS
today is random access to files. Does ECF have it?

We do have support for replication in ECF's APIs. Replication is used,for example, for the real-time shared editing that ECF introduced in3.4. Basically, for a shared editor the IDocument model is replicatedto the shared editing participants, and then read accesses are very fast(reading local copy like Hadoop/GFS). Changes/writes to the replicatedIDocument are applied locally and then asynchronously distributed toreplicants. Once the change has been received, a synchronizationalgorithm is used (cola) to resolve conflicts and prevent divergence ofthe replicas. We've just introduced a replicated model synchronizationAPI (see bug https://bugs.eclipse.org/bugs/show_bug.cgi?id=234142), thatexposes an API for synchronization strategies to be added/created,assuming a replicated model approach.

Although we could do something similar to what Hadoop is doing WRTreplicating file blocks, we have not as yet as we've been concentratingon other use cases/applications (real-time shared editing).

Note in case it's not clear from my statements above....I'm not arguingfor a single-asynchronous API...I think EFS and synchronous i/oapproaches are completely appropriate for many use cases. I'm also notarguing that we should discuss this into the ground and not do anyimplementation. But I do think there is a place for support forasynchronous i/o, asynchronous messaging, replication + synchronization,etc...particularly since it seems that E4 is intended to be a platformfor more distributed applications.


Scott

Follow-Ups:
- RE: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous
  - From: Oberhuber, Martin

References:
- [eclipse-incubator-e4-dev] [resources] Java7 / JSR203 and EFS
  - From: Oberhuber, Martin
- Re: [eclipse-incubator-e4-dev] [resources] Java7 / JSR203 and EFS
  - From: Scott Lewis
- RE: [eclipse-incubator-e4-dev] [resources] Java7 / JSR203 and EFS
  - From: Schaefer, Doug
- Re: [eclipse-incubator-e4-dev] [resources] Java7 / JSR203 and EFS
  - From: Scott Lewis
- [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous
  - From: Oberhuber, Martin

Prev by Date: RE: [eclipse-incubator-e4-dev] [resources] EFS IFileTree
Next by Date: Re: [eclipse-incubator-e4-dev] [resources] Alias management
Previous by thread: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous
Next by thread: RE: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous
Index(es):
- Date
- Thread

Breadcrumbs