[smila-dev] encoding during compilation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]

[smila-dev] encoding during compilation

From: Thomas Menzel <tmenzel@xxxxxxx>
Date: Wed, 3 Mar 2010 12:31:06 +0100
Accept-language: en-US, de-DE
Acceptlanguage: en-US, de-DE
Delivered-to: smila-dev@xxxxxxxxxxx
List-archive: <https://dev.eclipse.org/mailman/private/smila-dev>
List-help: <mailto:smila-dev-request@eclipse.org?subject=help>
List-subscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=subscribe>
List-unsubscribe: <https://dev.eclipse.org/mailman/listinfo/smila-dev>, <mailto:smila-dev-request@eclipse.org?subject=unsubscribe>
Thread-index: Acq6xQNdQb+ZfYKoTVKNxumOTEGeIQ==
Thread-topic: encoding during compilation

Hi folks,

This mail intends to

a) share a subtle encoding issue

b) start a discussion on how we want to treat the matter in SMILA.

Here goes the description of the encoding issue I ran into:

The scenario is the writing of a test case with a converter pipelet; but that is just the setting where it happened to me and might happen again elsewhere to s.o. else.

The expected result for the extracted item is “Microsoft® Office PowerPoint® 2007” (note the (R) char!)

As with tests, I hard coded this value in source code as it is sufficiently short and as soon as the converter worked the unit test (UT) was green – in the IDE!!

When I built from the command line the junit test would fail complaining that expected and actual value weren’t the same.

After some time of debugging and not getting anywhere, I switched the default encoding from my IDE to my system’s (cp1252, and it similarly works when setting the project’s encoding for the test bundle).

Having done this, eclipse recompiled the (whole) workspace – et voi là - the UT failed the same as it did on the console.

Vica versa I was also able to get it green on the console by setting this env var:

set JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8

(note that u need the UTF-8 and not UTF8 as I have seen on a webpage)

Reason:

The source file is written by the IDE in the encoding that is set. However, javac uses the encoding that is determined by the environment; in the IDE this is the same as for writing the files -- on the console this might be different since javac doesn’t know the encoding I have in the IDE. Usually this isn’t a problem as we seldom use special / non-ascii chars in our java code, but in this case it happened for a god reason and as a consequence it mattered with which encoding the compiler reads the source files.

In the light of this and our recommendation to use UTF-8 in our IDE as default encoding, I suggest that we do our builds also in UTF-8.

Any thoughts and comments on your end?

If we agree on this: where will we write this down for fellow developers?

Thomas Menzel

brox IT-Solutions GmbH
An der Breiten Wiese 9
30625 HANNOVER (Germany)
Mobil:      +49 (173) 369 86 76
Tel:          +49 (5 11) 33 65 28 – 76
eFax:       +49 (5 11) 33 65 28 – 98 76
Fax:         +49 (5 11) 33 65 28 – 29
Mail:        tmenzel@xxxxxxx
Web:       www.brox.de

==================================
According to Section 80 of the German Corporation Act brox IT-Solutions GmbH must indicate the following information.
Address: An der Breiten Wiese 9, 30625 Hannover Germany
General Manager: Hans-Chr. Brockmann
Registered Office: Hannover, Commercial Register Hannover HRB 59240
========== Legal Disclaimer ==========

Follow-Ups:
- [smila-dev] RE: encoding during compilation
  - From: Thomas Menzel

Prev by Date: Re: [smila-dev] SMILA Crawling mySQL DB
Next by Date: [smila-dev] RE: encoding during compilation
Previous by thread: [smila-dev] JMX performance counters >> memory leak?
Next by thread: [smila-dev] RE: encoding during compilation
Index(es):
- Date
- Thread

Breadcrumbs