Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [jgit-dev] JDBC based storage layer for JGit

Hi Shawn,

You are too fast for me to keep pace ;-) I was actually working on an enhanced version of my first prototype and already implemented most of your feedback. Seems I am working too slow...

Nevertheless I learned a lot by implementing this stuff and especially your feedback. I will now have a look at your Gerrit change and see whether I can help.

Cheers,
Philipp


-----Original Message-----
From: Shawn Pearce [mailto:spearce@xxxxxxxxxxx] 
Sent: Dienstag, 22. Februar 2011 07:25
To: Thun, Philipp
Cc: (jgit-dev@xxxxxxxxxxx)
Subject: Re: [jgit-dev] JDBC based storage layer for JGit

On Tue, Feb 15, 2011 at 16:17, Shawn Pearce <spearce@xxxxxxxxxxx> wrote:
> On Tue, Feb 15, 2011 at 11:40, Thun, Philipp <philipp.thun@xxxxxxx> wrote:
>> I've tried to review Shawn's DHT change (http://egit.eclipse.org/r/2295) and
>> then thought that it might be an interesting task for me to write an
>> implementation for JDBC. It's easier to understand something when you try it
>> out...
>>
>> So today I have uploaded a very first version of this JDBC storage provider
>> to github: https://github.com/philippthun/jgit_jdbc

Today I decided to rewrite a JDBC SPI as part of JGit itself, since
JDBC is part of the J2SE platform and is available anywhere JGit
itself runs. Orbit already has H2 available, so we can rely on H2 for
unit testing, but otherwise try to support any database by allowing
the user to supply their JDBC driver of choice.

The change is a ground-up rewrite of what Philipp proposed about a
week ago. Its missing the program bindings needing to run this from
the command line, and thus far only supports the H2 database (see
git_schema.sql), but it does more accurately implement the DHT spi.

http://egit.eclipse.org/r/2562

As far as porting to other databases goes, there are two major issues:

- Type of the binary columns (VARBINARY and BLOB)
- Sequence generator

Unfortunately binary columns are pretty database vendor specific, so
we might need to make the git_schema.sql file use macros and expand
them based on the database the user wants to load the schema into.
Most support the type "BLOB", but on some systems BLOB is overkill for
values < 255 bytes such as object_index table.

The sequence generator is supported on most databases, but some
require an auto_increment style column instead (MySQL). Even when
"CREATE SEQUENCE" is supported, the syntax used by the SELECT
statement to increment the sequence and obtain the next value varies
from vendor to vendor. So this stuff has to use the driver name or URL
to determine the correct code at runtime.


I'd like to break out more of the RefData fields into the ref table as
proper SQL columns, just to make it easier to examine current
reference state. But this might suggest its OK to edit those fields,
which might disagree with the ref_data member, so we probably just
have to convert *all* of the fields in RefData into columns of the
table. That's painful.

The chunk_info column might be able to use a smaller storage
(VARBINARY instead of BLOB), I can't remember what the distribution of
values is for it.

-- 
Shawn.


Back to the top