From:
daniel.stucky@xxxxxxxxxxxxx [mailto:daniel.stucky@xxxxxxxxxxxxx]
Sent: Dienstag, 4. Mai 2010 14:25
To: bogdan@xxxxxxx;
smila-dev@xxxxxxxxxxx
Subject: AW: [smila-dev] Calling a
crawler in a BPEL Pipeline!?
Hi Bogdan,
thanks for your
interest in SMILA.
Currently it is not
possible to use any Crawler or Agent from within a BPEL pipeline.
The solution to your
question depends on whether your scenario is interactive (1-5 synchronous) or
not (1-4 and 5 asynchronous).
A) Synchronous:
In
this case you would execute a BPEL pipeline just like a search pipeline.
In that pipeline you would have to use Pipelets to connect to Google, parse the
result page and fill the Lucene Index. These pipelets have to be implemented by
you, they are currently not part of SMILA (except for the Lucene Index
Pipelet). After that you would call the SearchPipelet on the Lucene Index and
return the results, all in the same Pipeline.
B) Asynchronous:
If
requesting Google and storing the results in Lucene and the actual search in
Lucene are independent processes (one triggered on a regular basis, the other
by users) then you could set up this regular indexing process using the
WebCrawler for accessing Google and a BPEL pipeline for parsing the results and
adding the records to Lucene index. For searching you could use the
standard search pipeline shipped with SMILA.
Something General:
Records are objects
to transport data in SMILA. Speaking of option B) the WebCrawler would create
one record containing the Google search result page. In the BPEL pipeline one
would use a pipelet to parse the result page and create N records from it (N
being the number of results per page). The pipelet interface allows for an
array of recordIDs as in and out parameters. Usually the parameters are equal
(n:n) but they can be, as in your case 1:n , or n:1 or n:m depending on the
data and the pipelets functionality.
I hope this helps!
Bye,
Daniel
Von:
smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Bogdan Eugen Sacaleanu
Gesendet: Dienstag, 4. Mai 2010
13:52
An: smila-dev@xxxxxxxxxxx
Betreff: [smila-dev] Calling a
crawler in a BPEL Pipeline!?
Hi,
I would like to use Smila for the following purpose:
- Send a request to
Google’s search engine
- Grab the result page
- Parse the result page to
extract information about each individual hit
- Save the resulting records in a
Lucene index
- Search the Lucene index for
some information
What would be the best setting of Smila components for this
goal!? Could I use the WebCrawler
for (1.+ 2.) embedded in a BPEL pipeline!? Should I create
the records (3.) within the WebCrawler or
should I do that within the Connectivity component using the
Router!?
Kind regards,
Bogdan.
#######################################################################
Bogdan Eugen Sacaleanu, Researcher & Software
Engineer
LT-Lab DFKI
Stuhlsatzenhausweg 3
66123 Saarbruecken, Germany
bogdan@xxxxxxx Phone: +49 681 302 5261
http://www.dfki.de/~bogdan Fax : +49 681 302 5338
#######################################################################
Deutsches Forschungszentrum fuer Kuenstliche
Intelligenz GmbH Trippstadter Strasse 122,
D-67663 Kaiserslautern, Germany
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster
(Vorsitzender) Dr. Walter´Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
#######################################################################