117035 – Bugzilla plugin ignores encoding

Bug 117035 - Bugzilla plugin ignores encoding

Summary: Bugzilla plugin ignores encoding

Status:	RESOLVED FIXED

Alias:	None

Product:	z_Archived
Classification:	Eclipse Foundation
Component:	Mylyn (show other bugs)
Version:	0.4
Hardware:	PC Windows XP

Importance:	P2 major with 1 vote (vote)
Target Milestone:	0.6
Assignee:	Robert Elves
QA Contact:

URL:
Whiteboard:
Keywords:	greatbug

Duplicates (1):	121725 (view as bug list)
Depends on:
Blocks:

Reported:	2005-11-18 08:40 EST by TomaszŚmietanka
Modified:	2006-05-19 12:57 EDT (History)
CC List:	5 users (show)

See Also:

Attachments
Sample screenshot (excerpt) of unicode in use (18.19 KB, image/jpeg) 2006-01-05 23:17 EST, David Barri	no flags	Details
ISO-8859-2 chars in bug added via web browser (58.85 KB, image/jpeg) 2006-01-09 09:03 EST, TomaszŚmietanka	no flags	Details
ISO-8859-2 chars in bug after posting comment from mylar (79.80 KB, image/jpeg) 2006-01-09 09:03 EST, TomaszŚmietanka	no flags	Details
Pic of some unicode chars working and others not (10.81 KB, image/gif) 2006-03-14 01:59 EST, David Barri	no flags	Details
mylar/context/zip (23.19 KB, application/octet-stream) 2006-05-16 12:16 EDT, Mik Kersten	no flags	Details
UTF-8 encoding support screenshot (29.53 KB, image/gif) 2006-05-17 20:06 EDT, Robert Elves	no flags	Details
Task view with 0.5.1.v20060517-1500 (23.43 KB, image/gif) 2006-05-17 20:40 EDT, David Barri	no flags	Details
Bug view with 0.5.1.v20060517-1500 (52.67 KB, image/gif) 2006-05-17 20:40 EDT, David Barri	no flags	Details
mylar/context/zip (23.91 KB, application/octet-stream) 2006-05-17 21:57 EDT, Mik Kersten	no flags	Details
Patch to fix encoding problem (4.32 KB, patch) 2006-05-18 00:37 EDT, David Barri	no flags	Details \| Diff
mylar/context/zip (54.39 KB, application/octet-stream) 2006-05-18 19:53 EDT, Robert Elves	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description TomaszŚmietanka

2005-11-18 08:40:04 EST

Plugin ignores encoding of Bugzilla pages. in outline, form, task list etc. Also
it not using proper encoding when send bugs to bugzilla server, and that make a
lot mess when i for example try change resolution of bug.

Comment 1 Mik Kersten

2005-11-23 12:13:08 EST

Could you point me at a public server that uses a different encoding?

Comment 2 TomaszŚmietanka

2006-01-05 12:00:45 EST

i've prepared test installation of bugzilla at http://bugzilla.strodd.net
it's using iso-8859-2 (Central European) Encoding.

Comment 3 Mik Kersten

2006-01-05 12:13:28 EST

Dzieki Tomasz, that's very helpful.

Comment 4 Mik Kersten

2006-01-05 22:16:24 EST

*** Bug 121725 has been marked as a duplicate of this bug. ***

Comment 5 Mik Kersten

2006-01-05 22:32:46 EST

Do you have any examples of SWT widgets using a characeter encoding (e.g. showing Polish accents in a list view)?  I haven't see this before and had a bit of trouble finding docs on how to do it.

The work-around for this is to use the Internal Browser to edit bugs (see Mylar -> Task List preference page), so I'm marking down the severity.

Comment 6 David Barri

2006-01-05 23:17:33 EST

Created attachment 32573 [details]
Sample screenshot (excerpt) of unicode in use

I've attached a sample screenshot of unicode in use in a list.

You don't need to do anything special to use unicode. All java and SWT is uses unicode natively. You just need to make sure that the string is being properly decoded when it gets it from bugzilla.

Comment 7 Mik Kersten

2006-01-06 17:01:09 EST

Tomasz, David, I've made some progress on this using your two servers, but with varying results:
- Bug reports should show up with the proper encoding on both servers.  I couldn't determine this completely with David's server because I seem to be missing some Japanese fonts.  Tomasz's server seems to not always send the content type charset correctly, so if that's not present the Bugzilla Client checks the HTML for this.  
- Bug posting: seems to work fine using Japanese fonts on David's 2.20 server.  Tomasz's 2.18 server seems appears to be ignoring the content type of the post.  I've tried to work around that but have not been successful yet. 

Could I ask you to try this support, and report on what works for you and what doesn't?  I just made a development build.  Note that it's Eclipse 3.2 only.  You can get it from:

download.eclipse.org/technology/mylar/update-site/dev

Comment 8 David Barri

2006-01-08 01:32:54 EST

Good news. The only thing that seems not to work is the internal browser. Well done :)
Below is my little report.

* POSTING A NEW BUG
	Unicode summary - pass
	Unicode description - pass
	Displays correctly in task list - pass
	Viewing in internal browser - fail (title is ok but only a few chars display correctly in the bug desc)

* BUG FROM BUGZILLA SERVER
	Displays correctly in task list - pass
	Viewing in internal browser - fail (title is ok but only a few chars display correctly in the bug desc)
	Viewing in external browser - pass

Comment 9 TomaszŚmietanka

2006-01-09 09:02:28 EST

Ok. I've checked it.

Bug Display entered via browser.
1. Display ISO-8859-2 in task list - passed
2. Display ISO-8859-2 in outline - failed ( look at first character in my surname )
3. Display comments - parialy passed ( the same problem, that, you can see in outline)


After Posting new comment:
1. Display ISO-8859-2 in task list - failed.
2. Display ISO-8859-2 in outline - failed.
3. Diplay comments - failed.

I've attached two screenshots before add new comment from mylar and after it.

Also i've got an idea. What do you think about adding paramter in plugin configutation. something like select widget with encodings.
@ <automatic>
  <UTF-8>
  <ISO-8859-2>
it will allow to force encoding.
it should resolve problem with encoding detection in custom bugzilla instalations.

Comment 10 TomaszŚmietanka

2006-01-09 09:03:28 EST

Created attachment 32673 [details]
ISO-8859-2 chars in bug added via web browser

Comment 11 TomaszŚmietanka

2006-01-09 09:03:59 EST

Created attachment 32674 [details]
ISO-8859-2 chars in bug after posting comment from mylar

Comment 12 David Barri

2006-01-09 22:06:21 EST

I also agree that a property in the bugzilla server preferences that allows the user to override the detected encoding and set it manually would be extremely beneficial. :)

Comment 13 Mik Kersten

2006-01-13 19:41:53 EST

That sounds good to me.  I'll try to get to it next week and leave this report open for the time being.  It will probably be set per repository.

Tomasz, this might not help your issue with posting to Bugzilla 2.18 though, since it's getting sent the right encoding but not honoring it.  This might need to get documented as a 2.18 limitation since I spent considerably time debugging it and couldn't make the server accept the encoding via a form POST.

Comment 14 David Barri

2006-02-27 03:24:04 EST

Although "bugzilla unicode support" is listed in the 0.4.9 whatsnew list, it's still not working properly. Japanese characters in the titles of bugs in bugzilla 2.20 (uncustomized) still do not load correctly. This was working in a dev build that Mik gave me a while ago. *cry*

Comment 15 Mik Kersten

2006-02-27 13:23:04 EST

David, Tomasz, could you please send me the login details of your repositories?  I lost them due to a recent laptop hard drive failure.

Comment 16 David Barri

2006-02-28 08:36:20 EST

Interesting discovery: if I add a query, my Japanese characters do not display in the task list. If I double-click the task, it reloads/syncs with the bugzilla server, opens a window displaying the bug info (with CORRECT Japanese chars) and updates the name of the bug in the mylar task list (again with the correct Japanese chars). Which is great! Mylar can correctly read unicode characters in the bug data but now it just needs to do it correctly when I click "synchronize with repository" on a query.

Comment 17 David Barri

2006-03-14 01:57:10 EST

Another interesting discovery. Correctly getting unicode titles from items only works half the time. It doesn't work for some characters (for example \u306e).
If I click on the "browser" tab at the bottom of the editor though it all displays correctly. Unfortunately though the mylar task list is full of unreadable bad characters :(

Comment 18 David Barri

2006-03-14 01:59:30 EST

Created attachment 36207 [details]
Pic of some unicode chars working and others not

The chars in the "bugzilla" tab are incorrectly displayed as diamonds and sqaures.

Comment 19 Mik Kersten

2006-03-14 10:45:15 EST

David, we hope to get back to this bug this week (possibly next).

Comment 20 Mik Kersten

2006-04-06 20:52:10 EDT

The remaining changes are too disruptive for 0.5.0, so postponing to 0.5.1.  Unfortunately, since this is one of our longest lived bug reports!

Comment 21 Robert Elves

2006-04-12 16:07:19 EDT

Will need to review this after bug# 136219 is resolved.

Comment 22 Mik Kersten

2006-04-12 16:10:43 EDT

The reason is that bug 136219 might resolve the remaining encoding issues for us, since it would prevent us from parsing HTML, and additinoally may reuiqre reworking some of the current encoding support.

Rob: as part of investigating that bug, could you please use one of the repositories linked here to ensure that encoding is preserved in the XML format?

Comment 23 Robert Elves

2006-05-15 19:50:44 EDT

Hi David,

We've completed the xml/rdf conversion and would like to address the encoding issues. It appears as though your bugzilla server is no longer available. Would it be possible to email me (relves@cs.ubc.ca) the latest location and login details so I can run some tests? Many thanks.

Comment 24 Mik Kersten

2006-05-16 12:16:06 EDT

Created attachment 41604 [details]
mylar/context/zip

Comment 25 Robert Elves

2006-05-16 20:33:22 EDT

As suggested previously, a setting may be required to specify the encoding used. Since the encoding doesn't appear to be sent by Bugzilla in the xml, this looks like our only option. This setting will probably need to be per repository.

Comment 26 Mik Kersten

2006-05-16 22:19:49 EDT

Rob, the other thing you could consider is checking the root page of the server and seeing what encoding it returns (e.g. https://bugs.eclipse.org/bugs returns ISO-8859-1), and set the encoding automatically if that turns out to be consistent.

Comment 27 Robert Elves

2006-05-17 20:06:38 EDT

Created attachment 41828 [details]
UTF-8 encoding support screenshot

David, thanks for letting us test using your sever. This screenshot shows the encoding support working. Mik has put a dev build up which includes this fix. Could you try out this dev build so we can be sure to have this encoding issue fixed for Friday's 5.2 release? Dev build update site: download.eclipse.org/technology/mylar/update-site/dev

Comment 28 David Barri

2006-05-17 20:38:48 EDT

Hi. No probs re: test server ;)
I just tried the new dev version (0.5.1.v20060517-1500) and unfortunately it doesn't work at all :~(
I tried with the test server that I shared with use (which is 2.20 out-of-the-box) and I also tried with a different server that I use for work (which is 2.22 out-of-the-box) and it didn't work with either. I also tried in a completely new workspace too.

I'll attach screenshots.

Comment 29 David Barri

2006-05-17 20:40:10 EDT

Created attachment 41830 [details]
Task view with 0.5.1.v20060517-1500

Comment 30 David Barri

2006-05-17 20:40:38 EDT

Created attachment 41831 [details]
Bug view with 0.5.1.v20060517-1500

Comment 31 Mik Kersten

2006-05-17 21:23:53 EDT

David, 0.5.1.v20060517-1500 didn't have the fix, so it looks like you managed to download before the latest build propagated to the servers.  Could you please try updating again (build should be 0.5.1.v20060517-1700) and let us know how it goes?

Comment 32 David Barri

2006-05-17 21:45:36 EDT

ok, i'm using 0.5.1.v20060517-1700 now but it's barely any different than before. In the task view, I resync'ed and the encoding is still screwed up. Tried with a new query and with new bugs just incase the names were cached. However, if I double click I bug and open the bug view the encoding sort of works. However there are charachers that incorrectly show up with blocks.

Using the test server I set up, look at bug #3. The title is '?? <blah "__" blah>'. The first two characters are \u78BA and \u8A8D. The first character displays correctly but I don't think the second char is being recognized as \u8A8D.

Comment 33 David Barri

2006-05-17 21:52:28 EDT

Also, i've never checked out the mylar source so I don't know the API at all. Could someone plz tell me which function is parsing/processing the raw bytes of the bugzilla feed? I just wanna have a quick look if nobody minds :)

Comment 34 Mik Kersten

2006-05-17 21:57:18 EDT

Created attachment 41838 [details]
mylar/context/zip

David, this context has the processing you're interested in, and if you're using Mylar you can retrieve it using the task list popup menu and see the classes of interest.  If not, take a look at RepositoryReportFactor.populateReport and the charset stuff in BugzillaReportSubmitForm.

Comment 35 Mik Kersten

2006-05-17 22:01:09 EDT

Regarding comment#32, for bug#1 is what you see different than the screenshot Rob attached?  I know that we were having trouble with the characters displaying in the task list and got squares instead, but were thinking that was due to a lack of space since they showed up properly in the editor.  In any case, any hints appreciated (including whether there is any special configuration that needs to be made on the server).  Rob should be able to pick this up again tomorrow.

Comment 36 David Barri

2006-05-17 22:15:54 EDT

Thanks for that info Mik, i'll have a quick look sometime today or later tonight.

Re comment#32 and Rob's screenshot, if I open the same bug it looks the same and works correctly. The thing is, like 80% of unicode characters are being converted successfuly but there are many others that are not. If you have a look in the description that bug (bug #1) You can see the the first and 4th-last characters on the 2nd-last line are incorrect.

No hints about server config however, sorry :) But the actualy hex codes of the characters that I put in the description of bug #1 should be helpful.

Comment 37 Mik Kersten

2006-05-17 22:20:37 EDT

Thanks David, that would be great.  Since some but not all characters are rendering I'm concerned that this could be due to the UTF-8 bugs in Bugzilla 2.20, which they have fixed for 2.22: http://www.bugzilla.org/releases/2.22/new-features.html#utf8

If it's not inconvenient it would be great if you could upgrade that server to 2.22 in case we're debugging around Bugzilla's bugs!

Comment 38 David Barri

2006-05-17 22:29:30 EDT

oh my work bugzilla server is 2.22 and i've testing with that too. Even with 2.22 it's always the same characters that are consistently garbled.

Comment 39 Mik Kersten

2006-05-17 22:33:19 EDT

Oh, ok.  If it's easy for you to update that test server to 2.22 that would be great.  We should create our own alternate encoding test server too, but not clear if we'll have time to do that this week.

Comment 40 David Barri

2006-05-17 22:34:59 EDT

alrighty. I'll get back to u on that one later tonight or tomorrow morning ;)

Comment 41 Mik Kersten

2006-05-17 22:39:10 EDT

Thanks 1M David!  (for that and for keeping up with what's probably our longest running bug report ;)

Comment 42 David Barri

2006-05-17 22:40:22 EDT

lol, np ;)

Comment 43 David Barri

2006-05-17 23:17:40 EDT

lol, i've fixed it (at least as far as I can see).
I will test a bit more, clean things up and submit a patch sometime today.

Comment 44 Mik Kersten

2006-05-17 23:20:00 EDT

Excellent!!  Consider yourself credited in the 0.5.2 New & Noteworthy :)

Comment 45 David Barri

2006-05-18 00:37:33 EDT

Created attachment 41844 [details]
Patch to fix encoding problem

Here you go. This fixes the encoding problem so that unicode text is correct and no longer garbled.

However, there is still a problem. I was looking thru the code and there is no charset assosiated with the repository. The getBug() function (or was it getReport() ?) attempts to auto-detect the encoding but as far as I can tell this doesn't happen during any other operations.

For example, when I add a new query and synchronize it, the encoding is still messed up. You need to double-click each bug so that RepositoryReportFactory.getBug() or whatever updates the attributes with the correct encoding.

I (humbly) suggest you add a encoding attribute to RepositoryConfiguration and then create something along these lines:

  public BufferedReader getReaderInLocalCharset(InputStream in) {
    if (getCharset() != null)
      return new BufferedReader(new InputStreamReader(in, getCharset()));
    else
      return new BufferedReader(new InputStreamReader(in));
  }

and then replace all relevent instances of "new BufferedReader(new InputStreamReader(xxxx))" with "RepositoryConfiguration.getReaderInLocalCharset(xxx)"

That would COMPLETELY resolve the issue. :)

Comment 46 Robert Elves

2006-05-18 13:20:09 EDT

Great! Thanks for the patch David and your suggestions. I'll be adding repository encoding association today.

Comment 47 Robert Elves

2006-05-18 19:51:00 EDT

Okay, the encoding support is in HEAD. David, if you get a moment it would be great if you could try it out again. One question remains concerning submission of reports. Currently they are being sent with the content type selected in the Task Repository config. I'm not 100% sure if this is correct or if it should always be sent in UTF-8. I guess it depends on what happens on Bugzilla's side?? If you have any insight on this I'm all ears. Thanks.

Comment 48 Robert Elves

2006-05-18 19:53:17 EDT

Created attachment 41970 [details]
mylar/context/zip

encoding context

Comment 49 David Barri

2006-05-18 21:54:56 EDT

I just checked out HEAD and spent some time trying it out and as far as I can tell it's working 100% PERFECTLY!!! No encoding problems whatsoever (at least with my UTF-8 repositories). Well done dude!!!! I don't have time today but if I get time over the weekend I will try to setup a bugzilla repos in some other encoding and test it out.

Comment 50 Mik Kersten

2006-05-18 22:21:16 EDT

That is GREAT to hear.  Nice work guys, what a good example of an open source collaborative effort!  It's always best when users step in to help themselves and push us in the process :)

So now it is my great pleasure to finally close this bug report.  David, if you do find any problems with another Bugzilla that you set up please open up a new bug or reopen this one.  I'm marking it as "greatbug" since that keyword applies here (even though we're not a part of Callisto product and as such not participating in the greatbug contest).

Comment 51 David Barri

2006-05-18 22:28:46 EDT

Too right Mik!
Thanks a lot for everyone's hard work! It has certainly paid off. :)

Comment 52 Robert Elves

2006-05-19 12:57:38 EDT

Yes, thanks again David!