Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] SMILA as search engine

Hi Rene,

 

I think the main reason for your problem is that current SMILA doesn’t extract the text from PDFs out-of-the-box.

 

We plan to provide this for the next release, but it’s not implemented yet.

So, with current SMILA, if you want to search on PDF content, you have to implement a Pipelet (or Worker) which will do the PDF-to-text extraction (e.g. by calling a 3rd party SW) and use that in your workflow.

 

Regards,

Andreas

 

 

Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Corinth, Rene
Gesendet: Donnerstag, 11. Oktober 2012 10:13
An: smila-dev@xxxxxxxxxxx
Betreff: [smila-dev] SMILA as search engine

 

Hi all,

I have one more question before SMILA go online in theseus….

If I want to use the advanced search in Theseus http://www.theseus-programm.de/en/75_smila.php?tpl=advanced and I’m searching for “Document Type” PDF, no title or summary is shown.

I think the problem is that I use just the webcrawler and not the filecrawler, but these pdf’s are in the web. So how can I combine these two crawlers or do I have to go a different way?

 

Cheers René


Back to the top