Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-user] Handling Streaming Ressources vs JMS

Hi Hannes,

 

thanks for your interest in SMILA.

 

At the moment the data exchange between Crawler and Connectivity does not support streaming. All the data of objects is actually copied (using a byte[])  as record attachments (or as Strings using record attributes). So you certainly cannot use data of such a big size as you plan to use.

 

However, perhaps you can still use SMILA to do the job J

 

Assuming that all machines you are running SMILA on are able to access the data to be processed (e.g. by public URL, a filesystem share  or a database, etc.) your Crawler could only provide the information necessary to access the data but not the data itself (e.g. a URL, or a path, or a database Id). In the BPEL pipeline then you would need to implement your own Pipelet that is capable of reading the data using a stream und create multiple records from the streamed data.

 

You may want to take a look at the org.eclipse.smila.processing.pipelets.xmlprocessing.XmlSplitterPipelet as a sample on how to generate new records from an existing record.

 

Bye,

Daniel

 

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von Hannes Carl Meyer
Gesendet: Dienstag, 11. Mai 2010 10:24
An: smila-user@xxxxxxxxxxx
Betreff: [smila-user] Handling Streaming Ressources vs JMS

 

Hi,

I'm thinking about giving SMILA a try for an indexing and text analysis project analyzing lots of realtime information such as Twitter's data.
Of course I started looking into SMILA's architecture (http://wiki.eclipse.org/SMILA/Architecture_Overview) wether it would be possible to handling streaming resources.

Regarding the Architecture Overview, is it really necessary to use JMS between the crawling and analysis?
I'm going to start over with a dataset of 500GB raw text messages and could imagine going up to 4-5TB - imho this would create an overhead when handling with JMS.

Looking forward hear your experiences!

Regards,

Hannes

--

https://www.xing.com/profile/HannesCarl_Meyer
http://de.linkedin.com/in/hannescarlmeyer
http://twitter.com/hannescarlmeyer


Back to the top