[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
AW: [smila-dev] problem with new text extractor?
|
Hi all,
I just checked in some updated/fixed configurations for the document processing.
The AddPipeline now does
- mimetype identification (based on extensions only)
- it only processes text based mimetypes, all others are skipped
- text/xml and text/html are converted to plain text by HTML2TXT pipelet
- the text is indexed in lucene
To avoid unnecessary load I also reduced the filter in the file datasource to only include txt, html and xml.
Bye,
Daniel
> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
> bounces@xxxxxxxxxxx] Im Auftrag von Daniel.Stucky@xxxxxxxxxxx
> Gesendet: Mittwoch, 15. Oktober 2008 11:33
> An: smila-dev@xxxxxxxxxxx
> Betreff: AW: [smila-dev] problem with new text extractor?
>
> Hi,
>
> that's because Georg removed Aperture from trunk and its use in the
> AddPipeline.
> I will update to AddPipeline today to something meaningful again.
> I'll let you all know when I'm finished.
>
> Bye,
> Daniel
>