#######################################################################
Bogdan
Eugen Sacaleanu, Researcher & Software Engineer
LT-Lab DFKI
Stuhlsatzenhausweg
3
66123
Saarbruecken, Germany
bogdan@xxxxxxx Phone:
+49 681 302 5261
http://www.dfki.de/~bogdan Fax :
+49 681 302 5338
#######################################################################
Deutsches
Forschungszentrum fuer Kuenstliche Intelligenz GmbH Trippstadter Strasse 122,
D-67663
Kaiserslautern, Germany
Geschaeftsfuehrung:
Prof. Dr. Dr.
h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter´Olthoff
Vorsitzender
des Aufsichtsrats:
Prof. Dr. h.c.
Hans A. Aukes
Amtsgericht
Kaiserslautern, HRB 2313
#######################################################################
From:
daniel.stucky@xxxxxxxxxxxxx [mailto:daniel.stucky@xxxxxxxxxxxxx]
Sent: Dienstag, 4. Mai 2010 16:44
To: bogdan@xxxxxxx;
smila-dev@xxxxxxxxxxx
Subject: AW: [smila-dev] Calling a
crawler in a BPEL Pipeline!?
Hi Bogdan,
it is possible to
access the JMX interface of the CrawlerController programmatically, so you can
start/stop crawls. See bundle org.eclipse.smila.management.jmx.client for
exmaples. Another way would be to get a reference to the CrawlerController OSGi
service and work directly on that. The later can only be done if you are in the
same process.
As SMILA is a
framework, you are free to implement a pipelet that uses the WebCrawler to
start a crawl. Of course a crawler run would lead to an asynchronous execution
of another pipeline, and you would have to wait in your calling pipeline for
the other to finish (to have the record added to Lucene) and only then you
could continue and search in the Lucene index. Lots of dependencies and
potential errors.
However, SMILA was
not designed to be used in this way. I would recommend that you implement a
separate pipelet that connects via HTTP to Google, executes a search, parses
the result page and creates the record objects (you can also split the
functionality into multiple pipelets for better reuse). This would all be run
in SearchPipeline.
A Crawler is always
coupled with a BPEL pipeline. All records a Crawler produces are consumed by
the configured BPEL pipeline. This is configured in the
QueueWorkerListenerConfig.xml. Currenntly there are two kinds of Pipelets,
Simple- and SearchPipelets. SearchPipelets can only be used in search pipelines,
SimplePipelets can be used in any pipeline. You cannot use a search pipeline as
a consumer for Crawler output.
Check out the wiki
for more information on Pipelets and Pipelines.
Bye,
Daniel
Von:
smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Bogdan Eugen Sacaleanu
Gesendet: Dienstag, 4. Mai 2010
15:06
An: smila-dev@xxxxxxxxxxx
Betreff: FW: [smila-dev] Calling a
crawler in a BPEL Pipeline!?
Hi Daniel,
thank you for your prompt
reply. My scenario requires synchronous calls of the mentioned components,
so I suppose I should go
for the proposed A option. That means that I would not make use of any of the
connectivity components
from SMILA, would I!?
Regarding the B option, how is the indexing process to be started
programmatically and be coupled
with a parsing BPEL
pipeline!? Wouldn’t be possible to add the search pipeline shipped with
SMILA to the
parsing pipeline!?
Kind regards,
Bogdan.
From:
daniel.stucky@xxxxxxxxxxxxx [mailto:daniel.stucky@xxxxxxxxxxxxx]
Sent: Dienstag, 4. Mai 2010 14:25
To: bogdan@xxxxxxx;
smila-dev@xxxxxxxxxxx
Subject: AW: [smila-dev] Calling a
crawler in a BPEL Pipeline!?
Hi Bogdan,
thanks for your
interest in SMILA.
Currently it is not
possible to use any Crawler or Agent from within a BPEL pipeline.
The solution to your
question depends on whether your scenario is interactive (1-5 synchronous) or
not (1-4 and 5 asynchronous).
A)
Synchronous:
In this case you would execute a BPEL pipeline just like a
search pipeline. In that pipeline you would have to use Pipelets to
connect to Google, parse the result page and fill the Lucene Index. These
pipelets have to be implemented by you, they are currently not part of SMILA
(except for the Lucene Index Pipelet). After that you would call the
SearchPipelet on the Lucene Index and return the results, all in the same
Pipeline.
B)
Asynchronous:
If requesting Google and storing the results in Lucene and the
actual search in Lucene are independent processes (one triggered on a regular
basis, the other by users) then you could set up this regular indexing process
using the WebCrawler for accessing Google and a BPEL pipeline for parsing the
results and adding the records to Lucene index. For searching you could
use the standard search pipeline shipped with SMILA.
Something General:
Records are objects
to transport data in SMILA. Speaking of option B) the WebCrawler would create
one record containing the Google search result page. In the BPEL pipeline one
would use a pipelet to parse the result page and create N records from it (N
being the number of results per page). The pipelet interface allows for an
array of recordIDs as in and out parameters. Usually the parameters are equal
(n:n) but they can be, as in your case 1:n , or n:1 or n:m depending on the
data and the pipelets functionality.
I hope this helps!
Bye,
Daniel
Von:
smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Bogdan Eugen Sacaleanu
Gesendet: Dienstag, 4. Mai 2010
13:52
An: smila-dev@xxxxxxxxxxx
Betreff: [smila-dev] Calling a
crawler in a BPEL Pipeline!?
Hi,
I would like to use Smila for the following purpose:
- Send a request to
Google’s search engine
- Grab the result page
- Parse the result
page to extract information about each individual hit
- Save the resulting
records in a Lucene index
- Search the Lucene
index for some information
What would be the best setting of Smila components
for this goal!? Could I use the WebCrawler
for (1.+ 2.) embedded in a BPEL pipeline!? Should I
create the records (3.) within the WebCrawler or
should I do that within the Connectivity component
using the Router!?
Kind regards,
Bogdan.
#######################################################################
Bogdan Eugen Sacaleanu, Researcher
& Software Engineer
LT-Lab DFKI
Stuhlsatzenhausweg 3
66123 Saarbruecken, Germany
bogdan@xxxxxxx Phone: +49 681 302 5261
http://www.dfki.de/~bogdan Fax : +49 681 302 5338
#######################################################################
Deutsches Forschungszentrum fuer Kuenstliche
Intelligenz GmbH Trippstadter Strasse 122,
D-67663 Kaiserslautern, Germany
Geschaeftsfuehrung:
Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender)
Dr. Walter´Olthoff
Vorsitzender des Aufsichtsrats:
Prof. Dr. h.c. Hans A. Aukes
Amtsgericht Kaiserslautern, HRB 2313
#######################################################################