490824 – PDF renderer does not include 4-byte UTF-8 characters.

Bug 490824 - PDF renderer does not include 4-byte UTF-8 characters.

Summary: PDF renderer does not include 4-byte UTF-8 characters.

Status:	NEW

Alias:	None

Product:	z_Archived
Classification:	Eclipse Foundation
Component:	BIRT (show other bugs)
Version:	4.5.1
Hardware:	PC Windows 7

Importance:	P3 major (vote)
Target Milestone:	---
Assignee:	Birt-ReportEngine-inbox@eclipse.org
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-03-31 15:21 EDT by Brent Kilgore
Modified:	2016-03-31 15:21 EDT (History)
CC List:	0 users

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Brent Kilgore

2016-03-31 15:21:30 EDT

We are seeing an issue where 4-byte UTF-8 characters are not displaying in PDF output.  They show up in the web-viewer and HTML outputs fine.  One through three byte sequence work correctly.

The field in question is being supplied from a java formatter class.  Debugging this class shows it leaves the formatter as a proper UTF-16 sequence.  This also happens when adding the character directly in the report generator.

The symptom is odd though compared to other mangling I've dealt with.   When a 4 byte sequence is encountered, it is completely omitted from the report.  There are no garbled characters, "?"s or whitespace where the symbol should occur. 

Similar issues have occurred in AIX, but that was because the shell's locale was not set to UTF-8.  Unfortunately, this is windows so there is no UTF-8 locale.