Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: Local file system abstraction still a work in progress ? (was: [jgit-dev] Using JGit in Google AppEngine)

On Sat, Aug 14, 2010 at 5:21 AM, Thomas Sauzedde <yaourt@xxxxxxxxxxxxxx> wrote:
> Another dumb question...
> In my git internals understanding quest, I realize that a single pack could
> be quite huge (within a smart protocol transaction) ...

Yes.  It can be the size of the entire project, and is during an
initial clone.  For example the current history of the Linux kernel is
upwards of 396 MiB the last time I checked its size in Git.  So the
initial clone of that project is a single 396 MiB pack being sent to
the client.

> AFAIU, a pack is required to be autonomous, in order to be able to
> "reconstruct" loose objects on the other side of the channel.

Yes.

> I didn't check the numbers but with a large git repo (let's say something
> like the linux kernel src), if I'm trying to clone such a repo and store it
> on GAE, I suppose that I will reach GAE limitations.

Yes.  I don't want to discourage you, but I know a lot about GAE, even
details that aren't generally public, and I think you are going to run
into trouble with even smaller Git projects.  Git is just very
demanding, and GAE has fairly small request limits because its built
for fast transaction web pages, not for bulk data processing.  GAE is
a very interesting platform, but its not designed for general
computation.

> Let's take this example, I'm cloning such a large repo locally, and then add
> a remote (empty) repo stored in GAE (my target).
> Then I'm pushing my local repo to this GAE remote ...
> AFAIU, during this last operation, there will be a single pack per ref

Its a pack for the entire transaction.  If you push 3 refs in a single
command line, its a single pack containing the data for all 3 refs.
If you push the entire Linux kernel repository to an empty
destination, it sends 396 MiB in a single HTTP POST request.

> pushed and so I suppose I will reach a GAE limitation like the 10MB per HTTP
> request (and / or the 30sec per request but this is another issue) ?!?

Probably.  Like I said above, GAE is built for fast transaction web
pages where the response time target for a request is under 1 second,
and the payload is small form data or small web content.  It isn't
suited to large data transfers.

> I can afford such a thing, but I'm wondering if I understood how smart /
> pack protocol is working ...

Well, even with purchased quota on GAE I don't think they will let you
exceed some hard limits on per-request CPU time, or
per-request/response payload size.  Smart HTTP transactions may still
be capped at 10 MiB per transfer even when you purchase capacity,
which means you can't push a large project like the Linux kernel.
Even a smaller project like git.git is ~26 MiB for its entire history.

It would be interesting to see what you can come up with, but I have
studied this problem (hosting Git on a cloud platform) and its not as
simple as it sounds if you want to handle any of the common
repositories out there (git itself, Linux kernel, etc.).  For a tiny
toy project its probably quite trivial (their repositories are often
below 1 MiB in size).

-- 
Shawn.


Back to the top