Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-dev] problem with new text extractor?

Hi all,

I just checked in some updated/fixed configurations for the document processing.
The AddPipeline now does
- mimetype identification (based on extensions only)
- it only processes text based mimetypes, all others are skipped
- text/xml and text/html are converted to plain text by HTML2TXT pipelet
- the text is indexed in lucene

To avoid unnecessary load I also reduced the filter in the file datasource to only include txt, html and xml.

Bye,
Daniel


> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
> bounces@xxxxxxxxxxx] Im Auftrag von Daniel.Stucky@xxxxxxxxxxx
> Gesendet: Mittwoch, 15. Oktober 2008 11:33
> An: smila-dev@xxxxxxxxxxx
> Betreff: AW: [smila-dev] problem with new text extractor?
> 
> Hi,
> 
> that's because Georg removed Aperture from trunk and its use in the
> AddPipeline.
> I will update to AddPipeline today to something meaningful again.
> I'll let you all know when I'm finished.
> 
> Bye,
> Daniel
> 

Back to the top