Bug 507678 - Encoding problem in ErrorParserManager at buffer boundaries + no way to specify encoding
Summary: Encoding problem in ErrorParserManager at buffer boundaries + no way to speci...
Status: NEW
Alias: None
Product: CDT
Classification: Tools
Component: cdt-core (show other bugs)
Version: 9.1.0   Edit
Hardware: PC Windows 7
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact: Jonah Graham CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-17 07:59 EST by Andreas Loth CLA
Modified: 2020-09-04 15:20 EDT (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Loth CLA 2016-11-17 07:59:36 EST
ErrorParserManager has a method write(byte[], int, int) which takes the given bytes and creates a String with them.

Multibyte characters at the buffer boundary can be cut into pieces by this approach (e.g. three byte character whose first 2 bytes are at the end of the current buffer and whose last byte is at the beginning of the next buffer). The right way would be to use a decoder (e.g. CharsetDecoder) which keeps state.

Furthermore, it would be nice when an encoding could be specified (for write(byte[], int, int), and outputLine(String, ProblemMarkerInfo) when l.getBytes() is called).

An alternative would be to not only accept byte streams but character streams, too.
Which means a method write(char[], int, int) and additionally to the outputStream, which is written in outputLine(), a Writer to pass the line as char[] instead of byte[] to avoid the re-encoding.