Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] push to S3 uses SSL?

On Sun, Jun 12, 2011 at 21:32, pablo platt <pablo.platt@xxxxxxxxx> wrote:
>> No. It uses plain text HTTP. The code predates Amazon supporting SSL
>> for S3 and hasn't really been updated since.
>
> Is it possible to add support for SSL?
> Sounds like it won't be too hard.
> It could be set in the config file with "ssl: true"

Yes, that would probably work.

The AmazonS3 class is responsible for loading the properties file, and
its open() method is responsible for building up the URL. It should be
a simple change to use https:// instead of http:// if there is an
"ssl: true" property in the configuration file.

> Is there a way to check and clean the repository on S3?

No. Right now the only way to clean the repository is to delete all of
the objects from S3 and push the entire thing again. The current
client is a bit of a hack, as the repository cannot really be used as
stored on S3, you can only use it as a transfer point to exchange data
between other Git repositories.

> Does the S3 repo reflects everything you do locally like packing

Repacking locally doesn't affect the remote S3 storage.

> and adding
> tags?

If you add a tag locally, it doesn't get pushed to any remotes by
default. You can explicitly push tags with `jgit push --tags` or `jgit
push refs/tag/name-of-tag`.

> What are the limitations of using jgit with S3?

Primarily that protocol is the "dumb" Git-over-HTTP. The remote side
has no knowledge of Git, and thus cannot do things like GC the
repository to clean up unused garbage files. It also cannot
consolidate packs together. Over time fetches/clones from S3 will slow
down because the client will need to download a large number of pack
files, and these are sequentially obtained. Each pack file corresponds
to one prior push. Pushing 100 times over 100 days means a client will
need to fetch 200 files in order to clone the repository from S3.

Pushes to S3 aren't concurrent-writer-safe. If two different writers
are both pushing to the "master" branch at the same time, one will
win, and the other will have their data lost, but both clients will
report success. This is caused by S3 not having a way to "lock" the
object that stores the current head of a branch and ensure it gets
replaced only if we know exactly what it is being replaced with.

> It seems so powerful that I don't understand why I could find only few blog
> posts about it.

A better implementation would use S3 more like a DHT, and use the new
JGit DHT support. This would make the remote S3 repository a proper
repository that you can execute operations against directly, e.g. from
an EC2 node, or just remotely from your own server or workstation.
Unfortunately I haven't had time to try and work on that, as fun as it
sounds.

-- 
Shawn.


Back to the top