Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Ketch: multi-master replicated Git


> On 13 Jan 2016, at 22:45, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> 
>> On Wed, Jan 13, 2016 at 8:30 AM, Saša Živkov <zivkov@xxxxxxxxx> wrote:
>> On Wed, Jan 13, 2016 at 4:07 PM, Luca Milanesio <luca.milanesio@xxxxxxxxx>
>> wrote:
>>> 
>>> Hi Shawn,
>>> worth sharing on the repo-discuss mailing list as well :-)
>>> 
>>> Could then I use Git Ketch to manage the agreement process on pushes (as
>>> Cassandra gives me some headache with that) and still use another DFS
>>> implementation based on Git objects on Cassandra?
>> 
>> 
>> If I understood the announcement [1] correctly you can.
> 
> Yes, this should be supported.
> 
> It gets a bit confusing because I think you are talking about having
> like 1 object store in a Cassandra cluster, and then the reference
> data is managed by Ketch? Ketch stores the references in the object
> store using RefTree, but still needs to use an odd number of copies of
> refs/txn/accepted on durable storage to form the voting system.

Ketch and Cassandra nodes could be co-located, and Ketch could use the local FS for his refs/txn/accepted while Cassandra storage could be used for everything else.

Typically a Cassandra cluster is at least a dozen of machines and typically is around one hundred. It would a configuration for large setups anyway ... We have great ambitions of growth for GerritHub :-)

> 
> To be honest I didn't consider a system layout such as this before. Up
> until this email I was thinking a minimum Ketch 3.0 (3 voters, 0
> followers) system would be 3 separate installations, e.g. 3 Linux
> servers running Git on local disk. With Cassandra its more like 3
> isolated Cassandra clusters providing 3 copies of the repositories,

Cassandra replication factor would make some copies of the data across the cluster, but isn't exactly copy of everything everywhere :-) it's more about partitioning / sharding.

> and if each Cassandra cluster itself is probably 3 machines at
> minimum, this is like a 9 machine system.

Or still 3 if each node runs both Cassandra and Ketch. Again 3 machine is a very small cluster anyway :-)

> 
> If those 9 machines are in the same data center than you may be better
> off with something like HDFS providing disk storage for JGit DFS

I thought about HDFS as well in the past the the problem is files explosion: name node will blow up for the number of files created by JGit for hundreds of thousands of repos :-(

> and
> using 3 local disks or 3 small installations of reliable databases for
> the RefTree bootstrap layer (where Ketch stores its
> refs/txn/accepted).


Back to the top