[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
Re: [smila-dev] SMILA IP Overview (workflow view)
|
Hi,
I)
I want to suggest a few amendments to diagram:
1. Filter now is a part of blackboard (BB), every BB service user able
to draw filtered record from BB.
2. Crawler controller works directly with DI service and, finally, put
it into Router. So, there is no separate connectivity module ( or it
contains only Router? ).
3. Router and Listener are also able to communicate with BB ( by task
"Synchronize" in "Rule" configuration )
II)
In my opinion AddPipeline did too much work (synchronously). As a
result, with current pipelines queue is not needed. We may directly call
AddPipeline after crawling ( for example by Router ). Its better to
split it into "ParsePipeline" and "AddToIndexPipeline" at least...
III) only FMY:
What is the issue to use following components?
1) "net.sf.joost" - STX language processor (similar to XSLT 1.0 but not
W3C standard)
2) "org.w3c.tidy" - HTML clean-up tool
--
Regards, Ivan
HTML Parser.
August Georg Schmidt wrote:
Hi Folks,
as answer to some questions from our PMC Sofya added a workflow
overview for the indexing process.
Within this process you can find additional information regarding 3^rd
party components that are used in SMILA.
http://wiki.eclipse.org/SMILA/Workflow_Overview
Kind Regards,
Georg
------------------------------------------------------------------------
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev