Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[egit-dev] Re: [JGit-io-RFC-PATCH v2 2/4] Add JGit IO SPI and default implementation

imyousuf@xxxxxxxxx wrote:
> The SPI mainly focus's in providing an API to JGit to be able to perform
> similar operations to that of java.io.File. All direct I/O is based on the
> java.io.Input/OutputStream classes.
> 
> Different JGit IO SPI provider is designed to be URI scheme based and thus
> the default implementation is that of "file" scheme. SPI provider will be
> integrated by their respective users in a manner similar to that of JDBC
> driver registration. There is a SystemStorageManager that has similar
> registration capabilities and the system storage providers should be
> registered with the manager in one of the provided ways.

I think this may be a bit in the wrong direction for what we are
trying to accomplish.

A number of people really want to map Git onto what is essentially
Google's BigTable schema.  Aside from Google's own BigTable product
(which I want to use Git on at work, because it would vastly simplfiy
my system administration duties at $DAYJOB) there is Cassandra and
Hadoop HBase which implement the same schema semantics.

None of those systems implement file streams, they implement cell
storage in a non-transactional system with a semi-dynamic schema.

Some people have built transactional semantics on top of these
storage layers, e.g. Google AppEngine provides multiple row
transactions through some magic sauce layered on top of BigTable.
I'm sure people will build similar tools on top of Cassandra
and HBase.

Where I'm trying to go with this is that things that are stored
in files on the filesystem in traditional Git wouldn't normally be
mapped into "byte streams" in a BigTable-ish system, or even the
JDBC-ish system you were describing.

For .git/config we might want to map config variable names into
keys in the table, with values stored in cells.  This makes it
easier to query or edit the data.

Fortunately, "Config" is abstract enough that we could subclass
it with a CassandraConfig and simply use that instance when on a
based Cassandra storage system.  No file streams required.  Ditto
for a JdbcConfig.

For RefDatabase, we'd want to do the same and avoid the concept of
packed-refs altogether.  Each Ref should go into its own row in a
Cassandra storage system, and essentially act as a loose object.
Ditto with JDBC.

We'd probably never need to read-or-write the info/refs or
objects/info/packs listings.

And I think that's everything that a bare repository needs, aside
from ObjectDatabase, which is already mostly abstract anyway.

-- 
Shawn.


Back to the top