hi,
thanks for that new bug report and the hints!
problems like that show that i18n/l10n is evil and all people should stick to [A-z0-9] ;)
Thomas Menzel @ brox IT-Solutions GmbH
From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of andreas.schank@xxxxxxxxxxxxx
Sent: Freitag, 4. März 2011 10:22
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] Smila cannot crawl files with Umlauts in zip files
Hi all,
I was playing around with the SMILA crawler and one of the files I crawled contained Umlauts in the file names.
I got exceptions during crawling and the files have not been processed.
I filed a bug (https://bugs.eclipse.org/bugs/show_bug.cgi?id=338905).
This is a known issue of Java’s zip implementation (it requires file names to be UTF-8-encoded, but most Zip tools encode the filenames differently.
I couldn’t find the Java bug entry but I know it’s been around for some years, it should be fixed in Java 7, they say).
A possible solution could be using commons-compress where different encodings can be used.
I had that same problem a year ago and had to switch to commons-compress to solve it.
Did any one of you experience a similar problem?
Bye
Andreas