Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] JGit DFS backend - has anyone tried to implement Cassandra?

Hi Alex,
I see your point and it wouldn't make more sense to have smaller pack files when you have a system like Cassandra?

In practice you should use packfiles of around 250MB (which means for a 1:5 compress ratio, a maximum file size of 1GB) which is still reasonable for keeping a Git server "healthy and well scalable".
For large files in Git I would rather opt for Git LFS support rather than store them natively in Git.

Do you foresee other problems with managing a Git repo with smaller pack-files?

P.S. On GerritHub.io the largest packfile we have is 1.6GB, big but still within Cassandra's limitations: the majority of packfiles are < 100MBytes anyway.

Luca.

> On 7 Jan 2016, at 14:40, Alex Blewitt <alex.blewitt@xxxxxxxxx> wrote:
> 
> 
> 
>> On 7 Jan 2016, at 13:56, Luca Milanesio <luca.milanesio@xxxxxxxxx> wrote:
>> 
>> Hi Alex,
>> thank you for your quick reply: as "you know someone" who did a real JGit DFS implementation ... and you believe is possible ... we get more confidence in starting this work.
> 
> Cassandra will be Ok for supporting references but I don't believe that it will work for storing the pack files (objects). So the question becomes: where do you store the data. You'd really need to have a plan for that first. Cassandra has limited sizes for individual entries (2G max, in practice orders of magnitude less according to the wiki) which would make it unsuitable for storing pack files in general. 
> 
> You might consider storing pack files in S3 and the references in Cassandra for example to provide a scalable solution. But you shouldn't rely on Cassandra for the pack files.
> 
> So where did you intend on storing the large data blobs?
> 
> Alex



Back to the top