507678 – Encoding problem in ErrorParserManager at buffer boundaries + no way to specify encoding

Bug 507678 - Encoding problem in ErrorParserManager at buffer boundaries + no way to specify encoding

Summary: Encoding problem in ErrorParserManager at buffer boundaries + no way to speci...

Status:	NEW

Alias:	None

Product:	CDT
Classification:	Tools
Component:	cdt-core (show other bugs)
Version:	9.1.0
Hardware:	PC Windows 7

Importance:	P3 normal (vote)
Target Milestone:	---
Assignee:	Project Inbox
QA Contact:	Jonah Graham

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-11-17 07:59 EST by Andreas Loth
Modified:	2020-09-04 15:20 EDT (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Andreas Loth

2016-11-17 07:59:36 EST

ErrorParserManager has a method write(byte[], int, int) which takes the given bytes and creates a String with them.

Multibyte characters at the buffer boundary can be cut into pieces by this approach (e.g. three byte character whose first 2 bytes are at the end of the current buffer and whose last byte is at the beginning of the next buffer). The right way would be to use a decoder (e.g. CharsetDecoder) which keeps state.

Furthermore, it would be nice when an encoding could be specified (for write(byte[], int, int), and outputLine(String, ProblemMarkerInfo) when l.getBytes() is called).

An alternative would be to not only accept byte streams but character streams, too.
Which means a method write(char[], int, int) and additionally to the outputStream, which is written in outputLine(), a Writer to pass the line as char[] instead of byte[] to avoid the re-encoding.