Bug 239838 - TSV file parsing is not intuitive
Summary: TSV file parsing is not intuitive
Status: NEW
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: BIRT (show other bugs)
Version: 2.3.0   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Birt-Converter-inbox CLA
QA Contact: Maggie Shen CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-07-07 13:57 EDT by David Alves CLA
Modified: 2010-11-09 04:12 EST (History)
2 users (show)

See Also:


Attachments
A TSV that BIRT can't import (264 bytes, text/plain)
2008-07-07 13:57 EDT, David Alves CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Alves CLA 2008-07-07 13:57:23 EDT
Created attachment 106738 [details]
A TSV that BIRT can't import

Build ID: I20080617-2000

Steps To Reproduce:
1. Create a report that uses the attached file as a flat file .tsv data source
2. BIRT will complain about invalid flat file format and will not import data.

More information:
BIRT doesn't like double quotes in the middle of a text field in a TSV file.
I tried escaping the double quotes many different ways before I finally figured out that a field which has quotes needs to escaped like this:
String escapedField = "\"" + rawField.replace("\"","\"") + "\"";

I think the parsing behavior should be documented in the BIRT documentation, since it is different from other definitions of the TSV format. For example, the MIME type text/tab-separated-values has no special characters other than tab. (http://www.iana.org/assignments/media-types/text/tab-separated-values) Excel 2003 also treats the quotes as expected.

I think either BIRT should ignore all non-tab characters in fields, or the rules it uses to parse .tsv files should be in the documentation somewhere.
Comment 1 David Alves CLA 2008-07-07 13:59:03 EDT
Sorry, that should read:

String escapedField = "\"" + rawField.replace("\"","\"\"") + "\"";