[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [equinox-dev] [prov] processing steps, restartable downloads and ECF

Hi Jeff,

Jeff McAffer wrote:

Thanks to Stefan we have been introducing  the notion of ProcessingSteps to munge the content as it is downloaded from an artifact repository.  This allows for things like inline MD5 digest checking, unpack200 processing, delta merging, signature checking, ...  All great stuff.  Pascal just raised a very interesting question.  How do we handle restarting?  Some background.  In the current prototype (in my workspace, not yet in CVS) there is a chain of ProcessingStep objects.  Each step is actually an OutputStream that knows about the next step (output stream) in the chain.  When a byte is written to step/stream N, it is processed (counted, transformed, ...) and then the result passed on to step N+1.  This repeats until finally the content gets to the last stream in the chain which is usually a FileOutputStream of some sort and so the content is then written to disk.  All is well.

Now, what happens if we crash or the user somehow pauses the download?  The content is partially processed/transformed but it would likely be too costly for each step to persist its intermediate results.  It would be more likely that somehow the raw content coming in to the head of the chain of steps is cached and then when the download is restarted after a crash/exit, the chain is recreated and the download is effectively replayed through the chain from the cache.  When that is done, the further content from the source would then be pushed through the chain.

So, two questions.  Does this make sense?  and if so, how should we implement this?  I wonder if ECF has some technology/support/designs in this area since it seems they support restartable downloads.  Scott?

Unfortunately not as much as we would like.  We do have API support for pausing/resuming downloads (IFileTransferPausable), and the existing impls do naively support this interface, but we need/want to add further/better implementation support (e.g. direct protocol support for protocols that have pause/resume, partial file caching, etc).

Actually, I'm a little surprised that you have so far passed the ProcessingSteps as output streams directly to the ECF OutputStream, as I was expecting that you would have a temporary file to receive the file contents, and then when the file reception is done *then* apply the ProcessingSteps. 

But in any event, we can add impl support for pause/resume/caching etc to the ECF receive implementations w/o changing API to support required use cases.  I would appreciate a little better understanding of the existing ProcessingSteps and their function...so could someone point me at the relevant packages/classes and I'll take more of a look?

Seems like this would also be a good topic for the upcoming Equinox Summit:  what enhancements are needed for file transfer both at API and impl:  e.g. pause/resume enhancements, file caching, monitoring/transfer statistics collection?, support more/other providers, etc.

Scott


Jeff

_______________________________________________ equinox-dev mailing list equinox-dev@xxxxxxxxxxx https://dev.eclipse.org/mailman/listinfo/equinox-dev