Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
RE: [smila-user] RE: JDBC-Crawling Phenomenon

on a side note:

 

i dont like pushing all kinds of records into the same Q. I always would open up an own Q for each kind of record and processing state to keep things separated that way changes wont mess up other parts so easily.

 

I really think we should make this more known to the community and establish that as a good practice. on the other hand: things will become more complicated in regard to config and setup.

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] On Behalf Of Thomas Menzel
Sent: Mittwoch, 30. September 2009 10:13
To: Smila project user mailing list
Subject: RE: [smila-user] RE: JDBC-Crawling Phenomenon

 

hi anderas,

 

yes, but condition for AddPipeline is:

operation == ADD && datasource not like (%feeds% or %xmldump%).

 

that condition is true for ur kinkon cases on an ADD op!!

 

Mit freundlichen Grüßen / Kind regards

Thomas Menzel

brox IT-Solutions GmbH
An der Breiten Wiese 9
30625 HANNOVER (Germany)
Mobil:      +49 (173) 369 86 76
Tel:          +49 (5 11) 33 65 28 – 76
eFax:       +49 (5 11) 33 65 28 – 98 76
Fax:         +49 (5 11) 33 65 28 – 29
Mail:       
tmenzel@xxxxxxx
Web:       www.brox.de

==================================
According to Section 80 of the German Corporation Act brox IT-Solutions GmbH must indicate the following information.
Address: An der Breiten Wiese 9, 30625 Hannover Germany
General Manager: Hans-Chr. Brockmann
Registered Office: Hannover, Commercial Register Hannover HRB 59240
========== Legal Disclaimer ==========

 

From: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] On Behalf Of Andreas.Schultz@xxxxxxxxxxx
Sent: Mittwoch, 30. September 2009 10:08
To: smila-user@xxxxxxxxxxx
Subject: AW: [smila-user] RE: JDBC-Crawling Phenomenon

 

Hi Thomas,

 

obviously one of us still misunderstands either the problem or the hints.

 

The condition which decides which Rule à Pipeline will be chosen is defined within the QueueWorkerListenerConfig -config:

 

  <Rule Name="ADD JDBC Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">

    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>

    <Condition>Operation='ADD' and DataSourceID LIKE '%kinkon%'</Condition>

    <Task>

      <Process Workflow="KinKonAddPipeline"/>

    </Task>

  </Rule>

 

So in this case, all DataSourceIDs containing %kinkon% should be routed through a specific KinKonAddPipeline.

But the same data (an identical set of data!; same DS, same DB, same whatever!)  sometimes is routed through

 

  <Rule Name="ADD Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">

    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>

    <Condition>Operation='ADD' and NOT(DataSourceID LIKE '%feeds%') and NOT(DataSourceID LIKE '%xmldump%')</Condition>

    <Task>

      <Process Workflow="AddPipeline"/>

    </Task>

  </Rule>

 

But: They are never routed through both of them!

 

Best

 

Andreas Schultz
Senior Software Developer

- - - - Bitte beachten Sie meine neuen Kontaktdaten - - - -


Empolis GmbH  |  Meisenstr. 90 | 33607 Bielefeld  |  Germany
AN ATTENSITY GROUP COMPANY
Phone +49 (0)521 55 785 413|  Fax +49 (0)521 55 785 121
andreas.schultz@xxxxxxxxxxx

 

www.empolis.com
Sitz Kaiserslautern  |  Amtsgericht Kaiserslautern HRB 30711  |  Geschäftsführer: Dr. Stefan Wess, Dr. Peter Tepassé

 

………………………………………………………………………………………………………………………………………………………………………………………………………..

Know. Right. Now.

Das ist unsere Philosophie. Empolis, an Attensity Group Company, bietet eine integrierte Suite von Geschäftsanwendungen,

die mit Hilfe patentierter semantischer Informations-Technologien die exponentiell wachsende Menge unstrukturierter
Daten analysiert, interpretiert und automatisiert verarbeitet. Entscheider, Experten, Mitarbeiter und Kunden erhalten so
stets situations- und aufgabengerecht genau das Wissen, das für ihre Arbeit relevant ist.

………………………………………………………………………………………………………………………………………………………………………………………………………..

Abonnieren Sie unseren monatlichen Newsletter: http://www.empolis.de/newsletter.html

 

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von Thomas Menzel
Gesendet: Mittwoch, 30. September 2009 09:43
An: Smila project user mailing list
Betreff: RE: [smila-user] RE: JDBC-Crawling Phenomenon

 

hi,

 

> what I meant was, that identical data goes through the [ADD Rule] sometimes and through the [ADD JDBC Rule] sometimes.

> And there is no obvious rule when which rule is chosen. That’s the problem.

 

exactly! u need to

- mark ur records distinctly so there is a condition that only one rule will select them and not the other OR

- put them into diff. Qs and have the listeners listen on their respective Qs.

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] On Behalf Of Andreas.Schultz@xxxxxxxxxxx
Sent: Mittwoch, 30. September 2009 09:07
To: smila-user@xxxxxxxxxxx
Subject: AW: [smila-user] RE: JDBC-Crawling Phenomenon

 

Hi Thomas,

 

what I meant was, that identical data goes through the [ADD Rule] sometimes and through the [ADD JDBC Rule] sometimes.

And there is no obvious rule when which rule is chosen. That’s the problem.

 

At  2009-09-29 15:40:34,799:

- Record is routed with rule [Default Route Rule] and operation [null], record id=177c250f8e116110396aaa5b1dd51662d633f6517dab42801d98be7f1765f6    

- Closing JdbcCrawler...                                                                                                                             

- Unregistering crawling thread kinkon_bookmark_jdbc                                                                                                

- Crawling thread kinkon_bookmark_jdbc unregistered                                                                                                 

- Crawling thread kinkon_bookmark_jdbc stopped.                                                                                                      

- Record is processed by Listener with rule: [ADD Rule] and operation [ADD], record id=177c250f8e116110396aaa5b1dd51662d633f6517dab42801d98be7f1765f6

 

At 2009-09-29 15:40:58,391:

Record is routed with rule [Default Route Rule] and operation [null], record id=177c250f8e116110396aaa5b1dd51662d633f6517dab42801d98be7f1765f6         

Closing JdbcCrawler...                                                                                                                                  

Record is processed by Listener with rule: [ADD JDBC Rule] and operation [ADD], record id=177c250f8e116110396aaa5b1dd51662d633f6517dab42801d98be7f1765f6

 

As you may have recognized, there are about 15 sec. between the operations. As I mentioned, I put exactly the same data (a single set) into the process.

I tried it several times afterwards to get a glimpse of an rule of it, but it reacts  totally heuristic. Always the same data!

 

Best

 

Andreas Schultz
Senior Software Developer

- - - - Bitte beachten Sie meine neuen Kontaktdaten - - - -


Empolis GmbH  |  Meisenstr. 90 | 33607 Bielefeld  |  Germany
AN ATTENSITY GROUP COMPANY
Phone +49 (0)521 55 785 413|  Fax +49 (0)521 55 785 121
andreas.schultz@xxxxxxxxxxx

 

www.empolis.com
Sitz Kaiserslautern  |  Amtsgericht Kaiserslautern HRB 30711  |  Geschäftsführer: Dr. Stefan Wess, Dr. Peter Tepassé

 

………………………………………………………………………………………………………………………………………………………………………………………………………..

Know. Right. Now.

Das ist unsere Philosophie. Empolis, an Attensity Group Company, bietet eine integrierte Suite von Geschäftsanwendungen,

die mit Hilfe patentierter semantischer Informations-Technologien die exponentiell wachsende Menge unstrukturierter
Daten analysiert, interpretiert und automatisiert verarbeitet. Entscheider, Experten, Mitarbeiter und Kunden erhalten so
stets situations- und aufgabengerecht genau das Wissen, das für ihre Arbeit relevant ist.

………………………………………………………………………………………………………………………………………………………………………………………………………..

Abonnieren Sie unseren monatlichen Newsletter: http://www.empolis.de/newsletter.html

 

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von Thomas Menzel
Gesendet: Dienstag, 29. September 2009 21:25
An: Smila project user mailing list
Betreff: [smila-user] RE: JDBC-Crawling Phenomenon

 

hi andreas,

 

i'm not entirely sure as what ur problem or error is that u see:

 

> both listeners take the record

this not a bug it’s a feature ;)

both conditions fit, so both can take on the records. on the concurrent system you cant tell which gets what.

 

 

> mimetype error , line 17

the default addpipline invokes the MIME type detection service that needs a file extension to do its work, which is contained in a field as defined in config/../MimeTypeConfig.xml

if the detection fails the rest of the processing is skipped (see <if name="conditionIsText">… ) and hence nothing is added to the index

 

since I guess u read from the DB and u don’t need to detect mime type this can be ignored

 

Kind regards

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] On Behalf Of Andreas.Schultz@xxxxxxxxxxx
Sent: Dienstag, 29. September 2009 17:44
To: smila-user@xxxxxxxxxxx
Subject: [smila-user] JDBC-Crawling Phenomenon

 

Hi all,

 

I have a really nice phenomenon using a JDBC DS:

 

After having succeeded to connect to the DB (MSSQL with authorization via Windows-Domain) which was really hard work,

I added an entry to the Listener-config to call my pipeline:

 

  <Rule Name="ADD JDBC Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">

    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>

    <Condition>Operation='ADD' and DataSourceID LIKE '%kinkon%'</Condition>

    <Task>

      <Process Workflow="KinKonAddPipeline"/>

    </Task>

  </Rule>

 

  <Rule Name="ADD Rule" WaitMessageTimeout="10" Threads="4" MaxMessageBlockSize="20">

    <Source BrokerId="broker1" Queue="SMILA.connectivity"/>

    <Condition>Operation='ADD' and NOT(DataSourceID LIKE '%feeds%') and NOT(DataSourceID LIKE '%xmldump%')</Condition>

    <Task>

      <Process Workflow="AddPipeline"/>

    </Task>

  </Rule>

 

The new pipeline has been a striped down copy of the normal addpipeline.

Funny was the behavior of the indexing-process: Sometimes it succeeded, sometimes not!

If you look at the attached log-file, you will discover 2 sections, first of failed to put the content to the index, second succeeded!

Obviously, the first one took its way through the ADD Rule,

“Record is processed by Listener with rule: [ADD Rule]”

The second one through the expected

“Record is processed by Listener with rule: [ADD JDBC Rule]”

 

Is this a misuse/ misconfiguration of mine or a bug?

 

Best

 

 

Andreas Schultz
Senior Software Developer

- - - - Bitte beachten Sie meine neuen Kontaktdaten - - - -


Empolis GmbH  |  Meisenstr. 90 | 33607 Bielefeld  |  Germany
AN ATTENSITY GROUP COMPANY
Phone +49 (0)521 55 785 413|  Fax +49 (0)521 55 785 121
andreas.schultz@xxxxxxxxxxx

 

www.empolis.com
Sitz Kaiserslautern  |  Amtsgericht Kaiserslautern HRB 30711  |  Geschäftsführer: Dr. Stefan Wess, Dr. Peter Tepassé

 

………………………………………………………………………………………………………………………………………………………………………………………………………..

Know. Right. Now.

Das ist unsere Philosophie. Empolis, an Attensity Group Company, bietet eine integrierte Suite von Geschäftsanwendungen,

die mit Hilfe patentierter semantischer Informations-Technologien die exponentiell wachsende Menge unstrukturierter
Daten analysiert, interpretiert und automatisiert verarbeitet. Entscheider, Experten, Mitarbeiter und Kunden erhalten so
stets situations- und aufgabengerecht genau das Wissen, das für ihre Arbeit relevant ist.

………………………………………………………………………………………………………………………………………………………………………………………………………..

Abonnieren Sie unseren monatlichen Newsletter: http://www.empolis.de/newsletter.html

 


Back to the top