Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] SMILA/Specifications/CrawlerAPIDiscussion09

Hi Allan,

Thank you for the response on crawler api (http://wiki.eclipse.org/SMILA/Specifications/CrawlerAPIDiscussion09) discussion. This very important question was in frozen state.

In my opinion, crawler developer should know nothing about SMILA inner objects and transports (MObject, Record, Deltra Indexing, SCA, etc).
He should implement only simple and understandable data-source iterator.

Approx. interface:

interface Crawler {
void start(IndexOrderConfiruration config);
DataSourceReference next();
void finish();
}
interface DataSourceReference {
Object getAttribute(String name);
byte[] getAttachment(String name);
}


I will be glad to hear and to discuss other ideas and opinions.

--

Ivan



Allan Kaufmann wrote:

Hi peoples

I have read this interesting discussion about the crawler api (http://wiki.eclipse.org/SMILA/Specifications/CrawlerAPIDiscussion09).

In my opinion it´s currently not easy to understand the crawler api, but I believe this should be a target if you want users and developers for this project who like it. I looked to this filesystem-crawler sample in your current smila trunk and need much time to understand this.

So what about keeping the crawlerapi simple like discussed on this site?

I think a nice way is to reduce the MObject and record creation to make it easier, maybe delivering all information together to crawlercontroller with an ArrayList. OK, probably I know you need to have a communication between Crawlercontroller and crawler to make generation indexing possible. So what about the second alternative, which was that getNextDeltaIndexing returns record. In that case the crawlercontroller received the information for id and hash. Then, if information are changed, the getRecord-method delivers the other attributes also as record and crawlercontroller could merge this. I think that would be easier to understand, but the other alternatives discussed on this site are also worth to discuss or decide about.

Greetings

Allan

Allan Kaufmann

*brox *IT-Solutions GmbH*
*An der Breiten Wiese 9
30625 HANNOVER (Germany)
Tel: +49 (5 11) 33 65 28 – 67
eFax: +49 (5 11) 33 65 28 – 98 78
Fax: +49 (5 11) 33 65 28 – 29
Mail: akaufmann@xxxxxxx <mailto:tmenzel@xxxxxxx>
Web: www.brox.de <http://www.brox.de/>

==================================
According to Section 80 of the German Corporation Act brox IT-Solutions GmbH must indicate the following information.
Address: An der Breiten Wiese 9, 30625 Hannover Germany
General Manager: Hans-Chr. Brockmann
Registered Office: Hannover, Commercial Register Hannover HRB 59240
========== Legal Disclaimer ==========

------------------------------------------------------------------------

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev



Back to the top