Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: AW: [smila-dev] voting against HASH calculating specification

What is really important (my point of view), the Crawler and the CrawlerDeveloper should not have to think about 
creating IDs and Hashes... and the Crawler Developer should not implement a hash method (or a id method).
Furthermore the developer should not work with internal data objects...they are to complex (for the crawlerdeveloper) and he there is no need
That he have to understand the object model.

Maybe the generation of ID and Hash can be on solved in the process of crawler, that means we have to store it into
some bundles that are anyways needed in the crawler "process" (maybe for the remote communication technology), but
The Crawler Bundle itself should only return data, processing the data will be solved from our framework, that include the generation 
Of ids and hash.
That would reduce remote communication and the crawler developer don’t have to implement something for it.

Sebastian


> -----Original Message-----
> From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Ivan Churkin
> Sent: Wednesday, October 22, 2008 11:00 AM
> To: Smila project developer mailing list
> Subject: Re: AW: [smila-dev] voting against HASH calculating specification
> 
> Hi Daniel,
> 
> Yes it was :)
> 
> Automatic HASH calculating may be easy included into suggested in the
> http://wiki.eclipse.org/SMILA/Specifications/CrawlerAPIDiscussion09
> Communication class
> It will solve automatic HASH calculating issues for JAVA based services.
> 
> And, also, in my opinion, HASH should be included into "datamodel" as
> separate property for Record and DIData because its specific and required.
> 
> record.getHash();
> diData.getHash();
> 
> (now it's ordinary Attribute with hardcoded name "HASH")
> 
> --
> Regards, Ivan
> 
> 
> 
> Daniel.Stucky@xxxxxxxxxxx wrote:
> > +1 from me.
> > I always voted for creating it inside the crawler :-)
> >
> > Bye,
> > Daniel
> >
> >
> >> -----Ursprüngliche Nachricht-----
> >> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
> >> bounces@xxxxxxxxxxx] Im Auftrag von Ivan Churkin
> >> Gesendet: Mittwoch, 22. Oktober 2008 10:36
> >> An: Smila project developer mailing list
> >> Betreff: [smila-dev] voting against HASH calculating specification
> >>
> >> Hi,
> >>
> >> I want to discuss old problem again. It's about HASH calculating.
> >> This problem relates to
> >> http://wiki.eclipse.org/SMILA/Specifications/CrawlerAPIDiscussion09
> >> discussion.
> >>
> >> It was specified that HASH should be calculated on Crawler Controller
> >> side automatically by configuration.
> >> In my opinion it's absolutely unacceptable for distributed systems.
> >> I'll argue it by the next sample.
> >>
> >> There are distributed system with 2 nodes. CrawlerController and
> >> FileCrawler are communicating remotely.
> >> FileCrawler is configured to calculate HASH by the file content.
> >> Let's imagine that FileCrawler is monitoring video archive and crawling
> >> procedure is started automatically every hour.
> >>
> >> Can you imagine that happens in this situation? Complete video archive
> >> will send remotely every hour )).
> >>
> >> --
> >> Ivan
> >>
> >> _______________________________________________
> >> smila-dev mailing list
> >> smila-dev@xxxxxxxxxxx
> >> https://dev.eclipse.org/mailman/listinfo/smila-dev
> >>
> >> ------------------------------------------------------------------------
> >>
> >> _______________________________________________
> >> smila-dev mailing list
> >> smila-dev@xxxxxxxxxxx
> >> https://dev.eclipse.org/mailman/listinfo/smila-dev
> >>
> 
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev

Back to the top