Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] JGit DFS backend - has anyone tried to implement Cassandra?

On Thu, Jan 7, 2016 at 11:45 AM, Alex Blewitt <alex.blewitt@xxxxxxxxx> wrote:
>
> On 7 Jan 2016, at 15:37, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
>
> _If_ you wanted to put everything into Cassandra, I would chunk pack
> files into say 1 MiB chunks and store the chunks in individual rows.
> This means configuring the DfsBlockCache using
> DfsBlockCacheConfig.setBlockSize(1 * MB). When creating a new pack
> generate a random unique name for the DfsPackDescription and use that
> name and the block offset as the row key.
>
>
> My recollection is that the DfsGarbageCollector coalesced everything into a
> single pack (that wasn’t garbage, which got its own pack and gets deleted
> later)

It makes two or three packs:

1) refs/heads/*
2) everything else (e.g. refs/meta/config)
3) remaining garbage

> https://github.com/eclipse/jgit/blob/master/org.eclipse.jgit/src/org/eclipse/jgit/internal/storage/dfs/DfsGarbageCollector.java#L354
>
> Of course I may have got that bit wrong too :)
>
> That would mean that triggering a repo.gc() would potentially overflow the
> Cassandra based storage, wouldn’t it?

Why would it overflow? DfsOutputStream is chunking into 1 MiB blocks
and storing each block into its own row. Cassandra is infinitely
scalable with its hashring distributing rows over the server pool. You
could store terabytes if you have enough disk and enough Cassandra
nodes.


Back to the top