Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous


Michael Scharf wrote:
> For me the question is who is calling which API of EFS?
> If all the calls are done during synchronization by the
> resource system, and if this already runs in a job, so
> why making EFS async at a fine level of granularity?


Synchronization between resources and EFS only uses one method from EFS (IFileStore#childInfos).  Currently this sychronization happens asynchronously as much as possible (RefreshJob), except when clients call refreshLocal. Often clients calling refreshLocal need the refresh to happen synchronously, but I could imagine adding an asynchronous variant of this for cases where the client doesn't need the refresh to happen immediately.

The EFS methods that modify the file system typically have a 1-1 mapping to some IResource API, so those resource APIs would also need an asynchronous variant to get any value out of asynchronous API at the EFS level. The other use of EFS is accessing attributes that are not cached at the resource level (file permissions, symlink names). For this kind of method I think an asynchronous variant doesn't make sense.

John



Michael Scharf <Michael.Scharf@xxxxxxxxxxxxx>
Sent by: eclipse-incubator-e4-dev-bounces@xxxxxxxxxxx

10/22/2008 11:41 AM

Please respond to
E4 developer list <eclipse-incubator-e4-dev@xxxxxxxxxxx>

To
E4 developer list <eclipse-incubator-e4-dev@xxxxxxxxxxx>
cc
Subject
Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF and asynchronous





Oberhuber, Martin wrote:
> Hi Michael,
>
> to me, the difference between sync and async is not so much
> about speed or the number of Threads anymore - it's about
> enforcing proper programming patterns. That's something I
> actually learned during this discussion.

Exactly! My question is: which calls are made by whom on
EFS? If a most of the efs calls are made to update
the IResource cache, then this batch of calls can be
made in a job and it has to be guaranteed that the
user can cancel it at any time. But then making a
call to get a file attribute async might be an overkill.

> Some examples:
>
> * Open a Directory Browse Dialog that happens to be initialized
>   with the URI remote://foohost/bar/baz and foohost happens not
>   to be online. All UI is blocked, you cannot even cancel the
>   request.

==> it makes sense to be async -- or start a job.

> * This can even happen with a LOCAL file system, I've seen this
>   repeatedly: My UNIX homedir is shared via SMB to my Windows
>   machine. In my UNIX home I have some symbolic links that point
>   to other NFS-shared folders from machines that are offline.
>   Just opening a directory browse dialog takes like forever
>   (even on Windows Explorer!)

async or a job would clearly help here

> * Dbl click large file foo.txt which is stored on a local SMB
>   shared, to load into the editor. While loading the file,
>   your network cable gets plugged off for some reason. Depending
>   on how the editor loading is implemented, all of Eclipse may
>   hang.
> * How often have you seen an Eclipse Progress Monitor like
>   "Waiting for Refresh Job to complete..." ?
>   Is it really necessary that the Refresh Job locks the workspace
>   for writing? Or could we allow more concurrency here?

your examples suggest that file loading should not happen in
the UI threat.

> Yes, of course you can defer all synchronous queries into Jobs
> with Progress etc... but do we actually do that? Not always.
> And rightly so, because the hassle of creating a Job to make
> the synchronous API happy is likely more than dealing with an
> async API right away.

> Asynchronous APIs just *force* the client to do something useful
> until the response of the request comes in. Where "something
> useful" could be just as simple as allowing a user to press
> CANCEL.

both cases (putting it into a job or making an async) call are
changing the programming model in a similar way: you have to
wait until the result is back. Async changes the way all EFS
implementations have to be written and the they change the clients.
Jobs change the clients as well, but the EFS implementations can
stay synchronous.

For me the question is who is calling which API of EFS?
If all the calls are done during synchronization by the
resource system, and if this already runs in a job, so
why making EFS async at a fine level of granularity?

And making opening a file async or put it into a job should
be done anyway.

Suppose I want to build some headless application on top of
EFS that runs a kind of batch job. In that case I don't need
async calls. I want process the files sequentially (with maybe
some timeout). But fine granular async calls would make the
API more complicated (in Java).

But as I said before, I have probably a too simplistic view here.....

Michael


> As an end user, I'm OK with waiting if I know I must wait. But
> I'd like to cancel operations that I believe won't return anyways,
> and I'd like to do other stuff in parallel until my request
> completes.
>
> Cheers,
> --
> Martin Oberhuber, Senior Member of Technical Staff, Wind River
> Target Management Project Lead, DSDP PMC Member
> http://www.eclipse.org/dsdp/tm
>  
>  
>
>> -----Original Message-----
>> From: eclipse-incubator-e4-dev-bounces@xxxxxxxxxxx
>> [mailto:eclipse-incubator-e4-dev-bounces@xxxxxxxxxxx] On
>> Behalf Of Michael Scharf
>> Sent: Wednesday, October 22, 2008 9:14 AM
>> To: E4 developer list
>> Subject: Re: [eclipse-incubator-e4-dev] [resources] EFS, ECF
>> and asynchronous
>>
>> When it comes to sync versus async at the EFS level, there
>> is something I don't understand (probably because I don't
>> know all the details of the APIs): I thought that IResource
>> is a kind of snapshot of the underlying EFS structure. If I
>> don't synchronize my workspace then IResource might show
>> me a structure that is not consistent with the file system.
>> Eclipse can deal with that. It happens often to me that
>> I open a file that does not exist anymore because I
>> forget to synchronize a directory that I have changed
>> externally.
>>
>> The synchronization is already a process that can take long
>> (and it does with some huge workspaces I have). So, where/when
>> is the of fast (synchronous) access to EFS needed/used/expected?
>>
>> I think a user that deals with a remote workspace is able to
>> understand that things cannot go as fast as on a local file
>> system. She might understand that caching is involved. And that
>> an update (of the cache) takes time. I would not hide this.
>> So, what are the cases/workflows where asynchronous access to
>> EFS is important if a local cache is involved?
>>
>> Michael
>>
>>> Hi Scott,
>>>
>>>> 2) Asynchronous access to files/resources is desirable and in
>>>> some cases necessary (for some use cases)
>>> Could you cite a use case where async access is necessary?
>>>
>>> I think that (assuming all synchronous methods have progress
>>> monitors for cancellation, which is the case in EFS), the
>>> only difference between sync and async access is
>>>   (1) the number of Threads in "wait" state,
>>>   (2) locking of resources while Threads synchronously wait,
>>>   (3) potential for coalescing multiple requests to the
>>>       same item in the case of asynchronous queries.
>>>
>>> In the asynchronous case, no Threads are waiting and resources
>>> *may* be unlocked until the callback returns, but this unlocking
>>> of resources needs to be carefully considered in each case.
>>> Does the system always remain in a consistent state? RESTful
>>> systems ensure this by placing all state info right into the
>>> request, which is a great idea but likely not always possible.
>>> It's not only a matter of the API being complex or not. The fact
>>> is that the concept of being asynchronous as such is more flexible,
>>> but also requires adopters to be more careful, or at least think
>>> along different lines.
>>>
>>> I also think that we should look into the need for being
>>> asynchronous or not separately for the kinds of requests:
>>>   (A) Directory retrieval (aka childNames())
>>>   (B) Full file tree retrieval
>>>   (C) Status/Attribute retrieval for an individual file
>>>   (D) File contents retrieval
>>>
>>> For (D) we already use Streams in EFS, which can be
>>> implemented in an asynchronous manner. What's currently
>>> missing in EFS is the ability to perform random access,
>>> like the JSR 203 SeekableByteChannel [1]. Interestingly, nio2
>>> has both a synchronous FileChannel [2] and
>>> AsynchronousFileChannel [3].
>>>
>>> For (A), (B), (C) I'm not sure how much we would win from
>>> an asynchronous variant, since I'd assume that not much
>>> work could be done (and not much resources freed) while
>>> asynchronously waiting for their result anyways. But perhaps
>>> I'm wrong?
>>>
>>>> 3) Using (e.g.) adapters it's not necessary to force such
>> an API on
>>>> anyone (rather it can be available when needed)
>>> Hm... so, let's assume that client X wants to do something
>>> asynchronous. So it does
>>>    myFileStore.getAdapter(IAsyncFileStore.class);
>>> some file systems would provide that adapter, others not.
>>> What's the client's fallback strategy in case the async
>>> adapter is not available?
>>>
>>> I'm afraid that if we use such adapters, we end up with the
>>> same code in clients again and again, because they need some
>>> fallbacks strategy. It seems wiser to place the fallback
>>> strategy right into the EFS provider, since it is always
>>> possible to write a bridge between a synchronous and an
>>> asynchronous API in a single, generic way.
>>>
>>> Therefore, I'm more in favor of determining what APIs we want
>>> to be asynchronous, and just adding them to EFS. The adapter
>>> idea could be used for adding provisional API, but the final
>>> API should not need that.
>>>
>>>>> To that extent, let's start assuming that files are quick
>>>> and local. And
>>>>> let's investigate how we could leverage ECF to support remote file
>>>>> systems. If that doesn't meet our needs, we can always add
>>>> async later.
>>> I'm not sure if this is a good strategy. It seems to lead
>>> towards more and more separation of local vs. remote --
>>> which, I think, leads to either duplication of code in the
>>> end, or non-uniform workflows for end users.
>>>
>>> Let me draw some sceanrio of what the world could look like
>>> in 10 years: with the Internet getting more and more into
>>> our lives, you'd want to use an Eclipse based product to
>>> dive into some code base that you just found on the net.
>>> Without downloading everything in advance. Or you browse into
>>> some mp3 music store. Add some remotely hosted Open Source
>>> Library to your UML drawing just by drag and drop.
>>>
>>> I think that users will more and more want to operate on
>>> remote networked resources just the same as on local
>>> resources. E4 gives us the chance to try and come up with
>>> models that support such workflows in a uniform way. Let's
>>> not throw away that chance prematurely.
>>>
>>> I agree that we need to start on concrete work items
>>> rather than endlessly discussing concepts. But as we
>>> start on these work items, let's keep the concept that
>>> things may be remote in our minds.
>>>
>>>> Sounds reasonable.  Just as an aside: I think there's a lot
>>>> of potential to use asynchronous file transfer + replication
>>>> to do caching of remote resources.
>>> That's a great approach, especially if it works on the
>>> file block level (such that random access to huge remote
>>> files can be cached). Again, one thing that's missing from EFS
>>> today is random access to files. Does ECF have it?
>>>
>>> [1]
>>>
>> http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
>> /SeekableB
>>> yteChannel.html
>>> [2]
>>>
>> http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
>> /FileChann
>>> el.html
>>> [3]
>>>
>> http://openjdk.java.net/projects/nio/javadoc/java/nio/channels
>> /Asynchron
>>> ousFileChannel.html
>>>
>>> Cheers,
>>> --
>>> Martin Oberhuber, Senior Member of Technical Staff, Wind River
>>> Target Management Project Lead, DSDP PMC Member
>>> http://www.eclipse.org/dsdp/tm
>>> _______________________________________________
>>> eclipse-incubator-e4-dev mailing list
>>> eclipse-incubator-e4-dev@xxxxxxxxxxx
>>> https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
>> _______________________________________________
>> eclipse-incubator-e4-dev mailing list
>> eclipse-incubator-e4-dev@xxxxxxxxxxx
>> https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev
>>
> _______________________________________________
> eclipse-incubator-e4-dev mailing list
> eclipse-incubator-e4-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev

_______________________________________________
eclipse-incubator-e4-dev mailing list
eclipse-incubator-e4-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/eclipse-incubator-e4-dev


Back to the top