Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] Smila cannot crawl files with Umlauts in zip files

hi,

 

thanks for that new bug report and the hints!

 

problems like that show that i18n/l10n is evil and all people should stick to [A-z0-9] ;)

 

Thomas Menzel @ brox IT-Solutions GmbH

 

From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of andreas.schank@xxxxxxxxxxxxx
Sent: Freitag, 4. März 2011 10:22
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] Smila cannot crawl files with Umlauts in zip files

 

Hi all,

 

I was playing around with the SMILA crawler and one of the files I crawled contained Umlauts in the file names.

I got exceptions during crawling and the files have not been processed.

 

I filed a bug (https://bugs.eclipse.org/bugs/show_bug.cgi?id=338905).

 

This is a known issue of Java’s zip implementation (it requires file names to be UTF-8-encoded, but most Zip tools encode the filenames differently.

I couldn’t find the Java bug entry but I know it’s been around for some years, it should be fixed in Java 7, they say).

 

A possible solution could be using commons-compress where different encodings can be used.

I had that same problem a year ago and had to switch to commons-compress to solve it.

 

Did any one of you experience a similar problem?

 

Bye

Andreas

 

Taglocity Tags: smila


Back to the top