Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] SMILA IP Overview (workflow view)

Hi Georg,

One installation scenario will be the installation of the connectivity module on an external computer (as a cluster).

What prevents installation of DI into one external computer and Router into the same or the second one external computer without grouping to Connectivity?
Really it does not matter, let it be Connectivity...

Therefore I don't think the queue is "not needed"
Its was only a hyperbola. I meant that current pipelines structure does not allow to split operations for scalable processing and therefore queue processing benefits are is not realized now.

What is the benefit of splitting the current pipeline into "ParsePipeline" and "AddToIndexPipeline"?

Now complete work is done by AddPipeline. Add operation contains at least two operations: parsing content and updating index. AddPipeline process operations synchronously. Some operations may be fast, other are slow/resource consuming or requires exclusive access to some resources. If to split operations, we may control them more effectively by adding additional processing threads for resource consuming operations or to execute them on different computers.

Now we have two ways how to execute processing operations.
1. by pipeline (one-by-one in one computer)
2. queue > listener > simple operation > queue > ... (asynchronously and easy to configure for multiple computers)

--
Regards, Ivan


August Georg Schmidt wrote:
Hi Ivan,

thanks for your comments.

One installation scenario will be the installation of the connectivity module on an external computer (as a cluster).
Therefore displaying this module externally shows an logical view.

On point II.

The pipeline is really a bit simple and a lot of work is done... The queue is a scalability option to scale the processing on multiple threads and on multiple compouter. Therefore I don't think the queue is "not needed". It's just the choice for distributing our work.
What is the benefit of splitting the current pipeline into "ParsePipeline" and "AddToIndexPipeline"? It's yet exemplary (the pipeline). We would add communication overhead by queue. But what would we gain?

Kind regards,

Georg


-----Original Message-----
From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Ivan Churkin
Sent: Mittwoch, 15. Oktober 2008 09:17
To: Smila project developer mailing list
Subject: Re: [smila-dev] SMILA IP Overview (workflow view)

And, also, User communicates with system via Management module.

Ivan Churkin wrote:
Hi,

I)
I want to suggest a few amendments to diagram:

1. Filter now is a part of blackboard (BB), every BB service user able to draw filtered record from BB. 2. Crawler controller works directly with DI service and, finally, put it into Router. So, there is no separate connectivity module ( or it contains only Router? ). 3. Router and Listener are also able to communicate with BB ( by task "Synchronize" in "Rule" configuration )

II)
In my opinion AddPipeline did too much work (synchronously). As a result, with current pipelines queue is not needed. We may directly call AddPipeline after crawling ( for example by Router ). Its better to split it into "ParsePipeline" and "AddToIndexPipeline" at least...

III) only FMY:
What is the issue to use following components?

1) "net.sf.joost" - STX language processor (similar to XSLT 1.0 but not W3C standard)
2) "org.w3c.tidy"  - HTML clean-up tool


--
Regards, Ivan





HTML Parser.

August Georg Schmidt wrote:
Hi Folks,

as answer to some questions from our PMC Sofya added a workflow overview for the indexing process.

Within this process you can find additional information regarding 3^rd party components that are used in SMILA.

http://wiki.eclipse.org/SMILA/Workflow_Overview

Kind Regards,

Georg

------------------------------------------------------------------------

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev



Back to the top