Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[smila-dev] Smila cannot crawl files with Umlauts in zip files

Hi all,

 

I was playing around with the SMILA crawler and one of the files I crawled contained Umlauts in the file names.

I got exceptions during crawling and the files have not been processed.

 

I filed a bug (https://bugs.eclipse.org/bugs/show_bug.cgi?id=338905).

 

This is a known issue of Java’s zip implementation (it requires file names to be UTF-8-encoded, but most Zip tools encode the filenames differently.

I couldn’t find the Java bug entry but I know it’s been around for some years, it should be fixed in Java 7, they say).

 

A possible solution could be using commons-compress where different encodings can be used.

I had that same problem a year ago and had to switch to commons-compress to solve it.

 

Did any one of you experience a similar problem?

 

Bye

Andreas


Back to the top