Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] JGit DFS backend - has anyone tried to implement Cassandra?

Hi Shawn,
thanks for your feedback, see below my comments.

> On 7 Jan 2016, at 15:37, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> 
> On Thu, Jan 7, 2016 at 5:56 AM, Luca Milanesio <luca.milanesio@xxxxxxxxx> wrote:
>> Hi Alex,
>> thank you for your quick reply: as "you know someone" who did a real JGit
>> DFS implementation ... and you believe is possible ... we get more
>> confidence in starting this work.
>> Should you have spare time to support us with answers or code, it will be
>> really appreciated :-)
> 
> FWIW, Alex wrote his DFS implementation with almost no help from me. :)

@Alex: I know your DFS isn't OpenSource but did you present the overall approach in any conference by not mentioning the Client's name?

> 
>> Cassandra is getting momentum for his ability of being scalable, very fast
>> in read, distributed on single or multiple geographical zones,
> 
> How does current Cassandra do on the Jepson tests?
> https://aphyr.com/posts/294-jepsen-cassandra

Actually Cassandra chooses to let the last write win: if you have client A and client B in two different zones writing exactly at the same time the same key, however comes last (in Cassandra's timing) will be written to disk.
I was thinking about Cassandra for extensibility and scalability of storage in the *same* geographical zone (local consistency) and Cassandra serves well and quickly that scenario.

If we want to push Cassandra to the global consistency level across remote geo-locations without a powerful dedicated CDN, it wouldn't work in practice as the introduced latency for the consistency checks would be too high.

Another approach is to "endorse inconsistency" and managing it.

Let's say that client A wants to push commit CA to master branch and at the same time client B wants to push commit CB to the same master branch of the same project at the same millisecond across two different zones.
CA and CB, assuming that they are different commits with different content, will have different SHA1 for their BLOBs, Trees and commit objects: we don't have then any conflict from a Cassandra perspective.
The trouble is when we want to update the ref, will the new master point to CA or CB?

A solution could be: append both and let the read operation resolve the conflict.
If you add both CA and CB, the client A and client B will find that after the push there are two values appended instead of one => they can then treat this case as a conflict and ask to repeat the operation.

As both CA and CB have all their objects already pushed (but only the refs wasn't updated), their second push attempt will be very quick. The retry wouldn't then create too much trouble to both.

Do you (or Alex) foresee problems with this approach?

> 
>> which would
>> make it a perfect candidate for Gerrit.
>> We may have a Cassandra expert helping us with this work ... and maybe
>> someone from DataStax could help as well.
>> 
>> Waiting for Shawn to wake up if he has some updates on his 5 years old post
>> on this topic.
> 
> The 5 year ago Cassandra work I did was based on JGit DHT, which is a
> different design. No database could keep up with JGit DHT, so I
> abandoned that approach and deleted the code from JGit. JGit DFS was
> the outcome of all of that.
> 
> _If_ you wanted to put everything into Cassandra, I would chunk pack
> files into say 1 MiB chunks and store the chunks in individual rows.
> This means configuring the DfsBlockCache using
> DfsBlockCacheConfig.setBlockSize(1 * MB). When creating a new pack
> generate a random unique name for the DfsPackDescription and use that
> name and the block offset as the row key.
> 
> DfsOutputStream buffers 1 MiB of data in RAM and then passes that
> buffer off as a row insert into Cassandra.
> 
> The DfsObjDatabase.openFile() method supplies a ReadableChannel that
> is accessed in aligned blockSize units, so 1 MB alignments. If your
> row keys are the pack name and the offset of the first byte of the
> block (so 0, 1048576, 2097152, ...) read method calls nicely line up
> to row reads from Cassandra. The DfsBlockCache will smooth out
> frequent calls for rows.
> 
> Use another row in Cassandra to store the list of packs. The
> listPacks() method then just loads that row. commitPacks() updates
> that row by inserting some values and removing other values. What you
> really want to store here is the pack name and the length so that you
> can generate the row keys.
> 
> Reference API in DfsRefDatabase is simple. But I just committed a
> change to JGit to allow other uses of RefDatabases. Because...
> 
> 
> The new RefTree type[1] is part of a larger change set to allow
> storing references inside of Git tree objects. (Git, in Git! Ahh the
> recursion!) This may simplify things a little bit as we only really
> need to store the pack and object data. Reference data is derived from
> pack data.
> 
> [1] https://git.eclipse.org/r/62967
> 
> RefTree on its own is incomplete. I should get another few commits
> uploaded today that provide a full RefDatabase around the RefTree
> type. I have it coded and working, just working on the unit tests to
> verify its working.
> 
> 
> The longer term trend here is I'm doing some Git multi-master work
> inside JGit now. RefTree is an important building block, but is far
> from complete. $DAY_JOB is evolving our Git multi-master system for
> $REASONS, and in the process trying to put support into JGit.

Thanks for the suggestion, that would definitely work.

If we choose 50MiB for a Cassandra row (which is still very acceptable), the number of packs across multiple rows will be quite limited anyway.
It shouldn't then create significant performance penalty.

One of the reasons of choosing Cassandra is for making the Git storage virtually "unlimited" with zero downtime.
Imagine that I have now 10 Cassandra nodes with 10 TB capacity ... if the volume increases a lot, it would be very easy to add more nodes and get extra storage without significant performance penalty in read or write.

Honestly I was thinking about both Cassandra AND GlusterFS to scale on Git storage, but Cassandra looked more promising as required less work on the infrastructure level.
Additionally I could use Cassandra as well for the DB side, so you have the same repo for Git data and DB even now in Gerrit 2.13.

All the feedback received so far on the feasibility of the Cassandra approaches look promising :-)

Luca.



Back to the top