Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[egit-dev] Re: patch/GetTextTest.testGetText_Convert() question

Meinrad Recheis <meinrad.recheis@xxxxxxxxx> wrote:
> I got a question about a test case that is failing on Windows *both*
> in Java (see your CI server) and in C#. Of course we can not
> completely exclude the possibility of a porting error. I am also not
> an expert on encoding so I'd like to ask you something:
> 
> If one changes the line
> 
> exp = exp.replace("\303\205ngstr\303\266m", "\u00c5ngstr\u00f6m");
> 
> to
> 
> exp = exp.replace("\u00C3\u0085ngstr\u00C3\u00B6m", "\u00c5ngstr\u00f6m");
> 
> which is just a different representation of the same string (isn't
> it?) then the test passes in C# on Windows. However, when doing this
> with the same line in testGetText_DiffCc() then the latter fails in C#
> on Windows. Because of this strange behavior I am not sure if the fix
> I found really is a fix or is just masking the real bug (which I
> suspect).

It does seem like your change should have no effect.  So I'm equally
confused about why it would work when you change it.  I would
blame it on the compiler not supporting the escape we are using,
but both the Java and the C# compilers are having an issue here,
so it must be our test case.

> Right now it is not possible for me to say *what exactly* is the
> expected result for sure. The problem could be a mistake in the test
> or in the system or in the patch code or in both.
> 
> Would it be possible to request, that the original author of the test
> (from the copyright it must be some guy from Google) rewrites it in
> order to make the intent of the test case unmistakably clear?

I think the original author may have been me.  The comment above it
tries to explain:

	// Read the original file as ISO-8859-1 and fix up the one place
	// where we changed the character encoding. That makes the exp
	// string match what we really expect to get back.

The point of the test is that the patch contents are expected to
be in UTF-8 encoding, but we originally parsed it in ISO-8859-1,
so multi-byte UTF-8 sequences are currently separate chars.
The getScriptText(Charset, Charset) method is supposed to perform
a transcoding of the content into thew 2nd charset, thus fixing
the multi-byte UTF-8 sequences to be correct.

That exp.replace call is trying to preform that fixup *without*
going through the same code path, so we can compare the two strings
and validate the result is correct.

> I think
> asserting raw byte sequences for equality instead of unicode strings
> would make it clear enough. That would probably make it possible to
> fix the issue on Windows both in java and in C#.

I'm not sure how to assert a raw byte sequence here.  The test is
about decoding a byte sequence into a character sequence, given
a guess about the character encoding.  If we convert back to a
byte sequence, can we still assert that the intermediate character
sequence was correct?

-- 
Shawn.


Back to the top