Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] Distributed Git Server using JGit

On Wed, May 16, 2018 at 10:12 PM, Mincong Huang <mincong.h@xxxxxxxxx> wrote:
Hi,

I'm creating a Git server, and I'd like to use JGit as implementation. JGit
contains a module called `org.eclipse.jgit.http.server` which allows to achieve
this easily via GitServlet[1]. However, I need the Git server to be clustered,
to provide a scalable solution. I've two possible solutions, but I want
to have your opinions about them.

Solution 1: N GitServlets + 1 NFS
Use N Git servlets and share the same network filesystem. Each server
points the same file system in the network. This solution is used by GitLab,
Personally, I'm afraid of concurrent file access to Git repository, which leads
to data corruption. According to this post[2], Git has mechanism to protect
itself, e.g using index lock. But a Git bare repository does not have index,
right? I'm confused.

Solution 2: N GitServlets + N DfsRepository + KeyValue DB
JGit provides an abstract class `DfsRepository`[3] to create a DFS repository.
This solution is used by Palantir[4] and Google[5], where data is stored in a
distributed database. I think this solution is for big company, and requires complex
setup. I don't have confidence to be able to implement DfsRepository correctly
and maintain an extra DB.

My implementation will be used by thousands of repositories, but only a few of
them are actively used. Therefore, the concurrent access should be very limited.

I'd like to have your comment about this subject.

Thanks,
Mincong


For option 1 I'd recommend you give Gerrit with its high-availability plugin a try
and if you face issues collaborate with the Gerrit community to improve this solution
instead of starting your own implementation which is for sure more work.

For option 2 you may consider to join an initiative started by Luca Milanesio, one
of the Gerrit maintainers, to implement an open source implementation of the
JGit DFS API on Cassandra. The current PoC patch series is maintained here

I don't get why you need a scalable server if only a few of your thousands of
repositories are actively used. There are many Gerrit installations serving
thousands of repositories from a single server.

-Matthias

Back to the top