[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [ecf-dev] E-intro [Was Efficient downloads]

Hi Filip,

Filip Hrbek wrote:
Hi Scott, comments inside.

- resume from a different location (e.g. different mirror)

Hmm. Don't know how you are going to accomplish that without something quite different from normal http, but sounds interesting.

Not sure for what protocols we are able to implement. To do this, we must be able to start downloading at a particular offset and finally check the file consistency, e.g. using a digest file if available. We also have to have a list of mirrors containing the same artifact (let's assume we've obtained it somewhere). This should be possible with http

There could be API supporting this feature.

This is what I would like to understand, as if additional API is *required* I would like to get that API (probably implemented as an adapter) into the ECF filetransfer API prior to the implementation.

Protocols which wouldn't support this would either make a workaround, or throw an exception.

The approach we've generally been using to allow runtime access to optional/additional features is IAdaptable:

ISomeInterface adapter = (ISomeInterface) someAdaptable.getAdapter(ISomeInterface.class);
if (adapter == null) {
// optional feature not supported
} else {
// optional feature is supported...use it!

This makes it possible to introduce new API (ISomeInterface) in plugin separate from filetransfer API, or in same plugin. It's quite handy, also, in the use of the IAdapterManager OSGi service/extension point, which lets new plugins set themselves up as implementers of a given interface declaratively. In any event, we don't have to use this mechanism to introduce new API, but we can if necessary/desired and it will have minimal impact on existing API.

- retrieving information from special headers (like Content-Disposition)
- detecting URL redirections to final mirrors

I'm not sure what you are going to use to implement this, but would be curious to find out.
If you download a file from an URL, you have to discover the filename if user doesn't specify it explicitly. The most precise solution is parsing the Content-Disposition header if it's available (browsers use it for determining the name of the file to save). Unlike other http headers, Content-Disposion has a very complex syntax. We should be able to parse it properly.

OK. Do all http x.y servers support Content-Disposition? Could you also point to the spec for it (w3c?) just for my information? And do you know if Apache httpclient 3.0.1 implements the parsing of Content-Disposition? If so, then perhaps the existing org.eclipse.ecf.provider.filetransfer.httpclient could simply be modified.

Detecting URL redirections would help us in statistics collection. It would be wrong to assign statistics belonging to different mirrors to one URL covering all the mirrors. This is why we should detect that reading from the covering URL points to different mirrors on different retrieval attempts. Finally we could automatically deprecate using some of the black-listed mirrors to avoid speed or timeout problems.

OK, this does sound like new API/interfaces for collecting these statistics.

I think you would need to describe what statistics are desired here. We can easily add adapter interfaces for collecting statistics associated with a given file retrieval/all to ecf or individual providers, but would need to know what stats are of interest.

The most interesting statistics:
- average download speed (related to concrete mirrors, geographical provider/consumer location, day time etc.)
- amount of bytes downloaded from particular location / during particular time period
- frequency of timeouts including timeout values
- etc.

We could share the statistics among users in an application by storing them on a server (the downloader would send the statistics to the server automatically). This would prevent users from attempts to access corrupted/slow repositories.

OK. Remy may want to comment on the overlap of these statistics with bittorrent (have you looked at bt as a possible approach? as it's pretty ubiquitous) and whether or not a common stats api could/should be created for both. Remy is the committer that did the bittorrent impl. We won't be able to do that immediately, given Europa finishing work, as I'm sure you understand.