Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[smila-dev] Problems with RecoverableExceptions

Hello everyone,

 

we implemented a crawler able to walk through an atlassian confluence instance. When we got into an error we threw an RecoverableException, cause those errors seemed to be just temporarily and most of them where time-outs when the confluence server wasn’t able to answer in the desired speed.

Using this method we found some strange behavior that might not be intended:

First of all: When the number of retries are reached the job is terminated (not the record that is affected) leaving the solr-index in an inconsistent state. (Some data is indexed some not, and while using the delta worker data from the previous run was deleted that actually wasn’t meant to be deleted!)

We circumvented this problem by always catching an error an dropping the record immediately. But this can’t be the absolute solution. Sometimes just an time-out occurs and a retry would be much appreciated!

 

What we also recognized was that after the job was marked FAILED some errors were seen in the log:

2014-08-05 09:53:58,040 WARN  [pool-4-thread-7                              ] taskworker.DefaultTaskLogFactory              - Task df1acc5f-940b-49fc-8d0b-67f0c4ad0561: Task 'df1acc5f-940b-49fc-8d0b-67f0c4ad0561' for job 'crawlConfluence' and run '20140805-095231322337' is unknown, maybe already finished or workflow run was canceled.

org.eclipse.smila.jobmanager.exceptions.IllegalJobStateException: Task 'df1acc5f-940b-49fc-8d0b-67f0c4ad0561' for job 'crawlConfluence' and run '20140805-095231322337' is unknown, maybe already finished or workflow run was canceled.

 

After searching we found out that these errors maybe caused by those other workers still working while the actual job has been failed. So the log is their way to say: “we recognized that the job is failed”. Is that the fact?

But the really troubling errors where those:

2014-08-05 09:53:51,657 ERROR [pool-4-thread-1                              ] taskworker.DefaultTaskLogFactory              - Task 2910cefa-4a02-48ff-b4b3-c2666f0b854d: Error while executing task 2910cefa-4a02-48ff-b4b3-c2666f0b854d in worker com.eccenca.importing.confluence.worker.ConfluenceObjectFetcherWorker@6481c861: Object with id 'pageBucket/257543c8-b090-4f34-848a-2e63b0863b1c0' does not exist in store 'temp'.

org.eclipse.smila.objectstore.NoSuchObjectException: Object with id 'pageBucket/257543c8-b090-4f34-848a-2e63b0863b1c0' does not exist in store 'temp'.

 

All of a sudden some records were missing leaving the objectstore in an inconsistent state. And if we restarted the job those errors occurred again. So there is some clean up missing.

 

Our questions: Is there a way (or is there something planned) to have those RecoverableExectons not causing a brutal failure? Something like: “Drop after n retries”. And maybe the last problem described concerning the objectstore a bug?

 

Thank you!

 

Kind Regards

Daniel

 

Mit freundlichen Grüßen / Kind regards

 

Daniel Hänßgen

http://www.brox.de/img/brox_original.png
phone +49 511 33652866
dhaenssgen@xxxxxxx

Postanschrift / Postal address:
brox IT-Solutions GmbH | An der Breiten Wiese 9 | 30625  Hannover | Germany

brox IT-Solutions GmbH
An der Breiten Wiese 9 | 30625  Hannover | Germany
Geschäftsführer / Board of Directors: Hans-Chr. Brockmann
Sitz und Registergericht / Domicile and Court of Registry: Hannover
HRB-Nr. / Commercial Register No.: 59240
USt-ID / VAT registration No.: DE 199 515 978

Diese Mail kann vertrauliche Informationen enthalten. Wenn Sie nicht Adressat sind, sind Sie nicht zur Verwendung
der in dieser Mail enthaltenen Informationen befugt. Bitte benachrichtigen Sie uns sofort über den irrtümlichen Empfang.

This e-mail may contain confidential information.
If you are not the addressee you are not authorized to make use of
the information contained in this e-mail. Please inform us immediately that you have received it by mistake.

 


Back to the top