[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [recommenders-dev] Snipmatch GSOC was: How to start with Recommender's source code

GIT repositories is an awesome idea: sounds good to me!

Yes, I'd be happy to be the primary mentor. I've contactes GSoC and
will follow up.

Cheng,

Yes, I think we should have common and personal snippets in separate
directories or repositories - either way works for me, but
repositories might make the most sense, since this would be nice to
support anyway (so that people can pull in code from multiple
repositories eventually).

In terms of format, XML works for me - that's what we've used until
now, but JSON is also fine. I'm not particular. However, the client
accepts XML, so XML might make more sense. I've attached a spreadsheet
of snippets marked up in XML. Tell me if you have trouble opening this
format - I didn't want to send as SQL (thought you might not be able
to easily open) or CSV (since there may be commas in the data). Since
it's the search patterns (see <pattern> within the XML) that we're
searching and ranking, we'll probably want to index on them.

This may be a lot of information to digest - please feel welcome to
send lots of questions!

Doug




On Mon, Feb 20, 2012 at 3:43 AM, chen cheng <chengchendoc@xxxxxxxxx> wrote:
> I agree with file system storage solution. Do not know what Doug's opinion
> is.
>
> But before coding, we must define some details:
>
> 1. One snippet one file, with XML format. Is it OK? Does file name make
> sense, or just keep them unique.
> 2. One type of snippets, one directory. Common repositories has its own
> directory,  Personal repositories has its own direcotry. Is it OK?
>
>
>
> 2012/2/20 Marcel Bruch <bruch@xxxxxxxxxxxxxxxxxx>
>>
>>
>> On 20.02.2012, at 08:56, chen cheng wrote:
>>
>> EGit in included in Eclipse 4, use Git as snippet repositories seems to be
>> a good idea. I agree with this solution.
>>
>> Users can even set up sync client side snippet repositories with server
>> side automatically or manually. In automatically mode, we use a thread to
>> sync for users, in manul mode, user sync and update/commit snippet by
>> themselves. But we should notice that, there are two kinds of snippet
>> repositories in server side: Common repositories and Personal repositories.
>>
>>
>> exactly. people have their own repos. "group" repositories might be
>> possible on top of this.
>>
>> 1.Common repositories is used by all the SnipMatch users, they can only
>> update from remote server as they like, but they can not submit freely. How
>> can users contribute common snippets ? May be we should formulate a rule or
>> something.
>>
>>
>> There the webservice comes into play. or we allow registered users to
>> create and edit new files. Finally, it's version controlled. So we might
>> rollback these changes easily. But there is some kind of trust involved.
>> Same as with Eclipse Wiki pages (and if I got Wayne's tweets right), there
>> are spammers too. This will become an issue. But not yet, I think :)
>>
>>
>> 2.Personal repositories is distinguished by user id, user can submit or
>> update in/from remote server as he like.
>>
>>
>> exactly.
>>
>> If Marcel can build a search engine base on Lucene and GIT file system, it
>> is perfect.
>>
>>
>> Nothing simpler than that :)
>>
>> About snippet storage format, one snippet one file, named with its
>> description ?
>>
>>
>> Simple formats can be: JSON or any arbitrary text format/markup. If we
>> manage to create a lightweight grammar, we can even provide simple and fast
>> editors with Xtext (although I wouldn't spent too much time on this detail
>> yet).
>>
>> Organized by category directory ? It seems there may be much small files,
>> very slow to load all these files, open close,open close ...
>>
>>
>> I think, organization by category only makes sense if 1 category is
>> permitted. Actually, I think a snippet has many tags, comments etc. Thus,
>> I'd stick with one directory with all files. Regarding performance: The
>> search index will filter what's needed. There is no need to load hundreds of
>> files. Also, you OS does a lot of caching. I think, this will not become a
>> bottleneck. But others may prove me wrong here.
>>
>> But let's  hide the store behind some slim interface, make the
>> implementation interchangeable, and go for a local file system approach
>> first?
>>
>>
>>
>> 2012/2/20 Marcel Bruch <bruch@xxxxxxxxxxxxxxxxxx>
>>>
>>> Hi Cheng, Hi Doug,
>>>
>>> regarding server-side backend:
>>> We use Apache CouchDB as database and JAX-RS as RESTful server-interface
>>> for client communication. However, I just wonder whether using GIT
>>> repositories as backend would be sufficient or even better than CouchDB in
>>> our case. It can be synced easily between clients and server, support for
>>> many potential sources is straight-forward, and with JGit and EGit we have
>>> quite usable front-ends and APIs to work with. I'd just add the Lucene
>>> search index on top of the file system resources. Best of it (for the
>>> moment): we can start immediately w/o waiting too long for the server-side.
>>>
>>> What do you think? Since it's merely a file-based approach with slim
>>> syncing capabilities we don't spent too much time on it if it proves not
>>> usable. But at least Github has proven that using GIT as snippet
>>> repositories works (at least they say GISTs are single git repositories).
>>>
>>>
>>> Chen,
>>> yes, please go for the proposal with the points you mentioned. Project
>>> mentor should be Doug, I'll be second mentor.
>>>
>>> Doug,
>>> you have to sign in as Mentor on the GSOC page and send Wayne an email
>>> that he confirms you are Eclipse Committer and eligible to be a Mentor for
>>> Eclipse GSOCs.
>>>
>>> Regarding search-engine:
>>> I'll be glad to write the search interface. We just need to agree on a
>>> snippet storage format.
>>> (my favorite for the moment is plain text with some mark-up)
>>>
>>> Marcel
>>>
>>> On 20.02.2012, at 04:03, chen cheng wrote:
>>>
>>> > Hi Doug,
>>> >
>>> > Yeah, In my initial idea, server side search engine and client side
>>> > engine should be implemented the same, at least have similar search result.
>>> >
>>> > Also, i am happy to work for the server side if i have enough time, but
>>> > one question. I am not very sure about your solution, you mean we develop a
>>> > brand new Java based server? Or we still use the current PHP server, but
>>> > implement search algorithm in Java (May even use Lucence etc in the future),
>>> > PHP code invoke Java search result ?
>>> >
>>> > I guess you mean solution two, right ?
>>> >
>>> > Here is my plan about improving SnipMatch client side:
>>> >
>>> > 1. Implement all the design in my last post, create new Eclipse
>>> > preference, local storage, improve GUI etc for SnipMatch. Leaving data
>>> > interface for search engine (use a simple string compare algorithm at
>>> > beginning, then improve the search engine in the future).
>>> >
>>> > 2. Wait for the backend of Doug's job, implement search engine both for
>>> > client side and server side. After i finished client side work, i can work
>>> > with Doug together for the Java based search engine both for client side and
>>> > server side. In fact, i am thinking there may be other search engines in
>>> > Recommenders' other module such as Code Complete feature, is it possible for
>>> > us to use some existing search engine ? Marcel, need your answer here :-)
>>> >
>>> > Doug & Marcel, is it OK about this plan ? If everything is prepared, i
>>> > will write a detailed project proposal for this SnipMatch's merging and
>>> > improving job, and start coding soon. And as a GSoC project, i need a
>>> > project mentor, so i am just waiting for your favored here :-D
>>> >
>>> >
>>>
>>>
>>> _______________________________________________
>>> recommenders-dev mailing list
>>> recommenders-dev@xxxxxxxxxxx
>>> http://dev.eclipse.org/mailman/listinfo/recommenders-dev
>>
>>
>>
>>
>> --
>> Best Regards From Cheng Chen [chengchendoc@xxxxxxxxx]
>> _______________________________________________
>> recommenders-dev mailing list
>> recommenders-dev@xxxxxxxxxxx
>> http://dev.eclipse.org/mailman/listinfo/recommenders-dev
>>
>>
>> Thanks,
>> Marcel
>>
>> --
>> Eclipse Code Recommenders:
>>  w www.eclipse.org/recommenders
>>  tw www.twitter.com/marcelbruch
>>  g+ www.gplus.to/marcelbruch
>>
>>
>> _______________________________________________
>> recommenders-dev mailing list
>> recommenders-dev@xxxxxxxxxxx
>> http://dev.eclipse.org/mailman/listinfo/recommenders-dev
>>
>
>
>
> --
> Best Regards From Cheng Chen [chengchendoc@xxxxxxxxx]
>
> _______________________________________________
> recommenders-dev mailing list
> recommenders-dev@xxxxxxxxxxx
> http://dev.eclipse.org/mailman/listinfo/recommenders-dev
>

Attachment: snippets.xlsx
Description: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet