Bug 117035 - Bugzilla plugin ignores encoding
Summary: Bugzilla plugin ignores encoding
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: 0.4   Edit
Hardware: PC Windows XP
: P2 major with 1 vote (vote)
Target Milestone: 0.6   Edit
Assignee: Robert Elves CLA
QA Contact:
URL:
Whiteboard:
Keywords: greatbug
: 121725 (view as bug list)
Depends on:
Blocks:
 
Reported: 2005-11-18 08:40 EST by TomaszŚmietanka CLA
Modified: 2006-05-19 12:57 EDT (History)
5 users (show)

See Also:


Attachments
Sample screenshot (excerpt) of unicode in use (18.19 KB, image/jpeg)
2006-01-05 23:17 EST, David Barri CLA
no flags Details
ISO-8859-2 chars in bug added via web browser (58.85 KB, image/jpeg)
2006-01-09 09:03 EST, TomaszŚmietanka CLA
no flags Details
ISO-8859-2 chars in bug after posting comment from mylar (79.80 KB, image/jpeg)
2006-01-09 09:03 EST, TomaszŚmietanka CLA
no flags Details
Pic of some unicode chars working and others not (10.81 KB, image/gif)
2006-03-14 01:59 EST, David Barri CLA
no flags Details
mylar/context/zip (23.19 KB, application/octet-stream)
2006-05-16 12:16 EDT, Mik Kersten CLA
no flags Details
UTF-8 encoding support screenshot (29.53 KB, image/gif)
2006-05-17 20:06 EDT, Robert Elves CLA
no flags Details
Task view with 0.5.1.v20060517-1500 (23.43 KB, image/gif)
2006-05-17 20:40 EDT, David Barri CLA
no flags Details
Bug view with 0.5.1.v20060517-1500 (52.67 KB, image/gif)
2006-05-17 20:40 EDT, David Barri CLA
no flags Details
mylar/context/zip (23.91 KB, application/octet-stream)
2006-05-17 21:57 EDT, Mik Kersten CLA
no flags Details
Patch to fix encoding problem (4.32 KB, patch)
2006-05-18 00:37 EDT, David Barri CLA
no flags Details | Diff
mylar/context/zip (54.39 KB, application/octet-stream)
2006-05-18 19:53 EDT, Robert Elves CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description TomaszŚmietanka CLA 2005-11-18 08:40:04 EST
Plugin ignores encoding of Bugzilla pages. in outline, form, task list etc. Also
it not using proper encoding when send bugs to bugzilla server, and that make a
lot mess when i for example try change resolution of bug.
Comment 1 Mik Kersten CLA 2005-11-23 12:13:08 EST
Could you point me at a public server that uses a different encoding?
Comment 2 TomaszŚmietanka CLA 2006-01-05 12:00:45 EST
i've prepared test installation of bugzilla at http://bugzilla.strodd.net
it's using iso-8859-2 (Central European) Encoding.
Comment 3 Mik Kersten CLA 2006-01-05 12:13:28 EST
Dzieki Tomasz, that's very helpful.
Comment 4 Mik Kersten CLA 2006-01-05 22:16:24 EST
*** Bug 121725 has been marked as a duplicate of this bug. ***
Comment 5 Mik Kersten CLA 2006-01-05 22:32:46 EST
Do you have any examples of SWT widgets using a characeter encoding (e.g. showing Polish accents in a list view)?  I haven't see this before and had a bit of trouble finding docs on how to do it.

The work-around for this is to use the Internal Browser to edit bugs (see Mylar -> Task List preference page), so I'm marking down the severity.  
Comment 6 David Barri CLA 2006-01-05 23:17:33 EST
Created attachment 32573 [details]
Sample screenshot (excerpt) of unicode in use

I've attached a sample screenshot of unicode in use in a list.

You don't need to do anything special to use unicode. All java and SWT is uses unicode natively. You just need to make sure that the string is being properly decoded when it gets it from bugzilla.
Comment 7 Mik Kersten CLA 2006-01-06 17:01:09 EST
Tomasz, David, I've made some progress on this using your two servers, but with varying results:
- Bug reports should show up with the proper encoding on both servers.  I couldn't determine this completely with David's server because I seem to be missing some Japanese fonts.  Tomasz's server seems to not always send the content type charset correctly, so if that's not present the Bugzilla Client checks the HTML for this.  
- Bug posting: seems to work fine using Japanese fonts on David's 2.20 server.  Tomasz's 2.18 server seems appears to be ignoring the content type of the post.  I've tried to work around that but have not been successful yet. 

Could I ask you to try this support, and report on what works for you and what doesn't?  I just made a development build.  Note that it's Eclipse 3.2 only.  You can get it from:

download.eclipse.org/technology/mylar/update-site/dev
Comment 8 David Barri CLA 2006-01-08 01:32:54 EST
Good news. The only thing that seems not to work is the internal browser. Well done :)
Below is my little report.

* POSTING A NEW BUG
	Unicode summary - pass
	Unicode description - pass
	Displays correctly in task list - pass
	Viewing in internal browser - fail (title is ok but only a few chars display correctly in the bug desc)

* BUG FROM BUGZILLA SERVER
	Displays correctly in task list - pass
	Viewing in internal browser - fail (title is ok but only a few chars display correctly in the bug desc)
	Viewing in external browser - pass
Comment 9 TomaszŚmietanka CLA 2006-01-09 09:02:28 EST
Ok. I've checked it.

Bug Display entered via browser.
1. Display ISO-8859-2 in task list - passed
2. Display ISO-8859-2 in outline - failed ( look at first character in my surname )
3. Display comments - parialy passed ( the same problem, that, you can see in outline)


After Posting new comment:
1. Display ISO-8859-2 in task list - failed.
2. Display ISO-8859-2 in outline - failed.
3. Diplay comments - failed.

I've attached two screenshots before add new comment from mylar and after it.

Also i've got an idea. What do you think about adding paramter in plugin configutation. something like select widget with encodings.
@ <automatic>
  <UTF-8>
  <ISO-8859-2>
it will allow to force encoding.
it should resolve problem with encoding detection in custom bugzilla instalations.

Comment 10 Tomasz&#346;mietanka CLA 2006-01-09 09:03:28 EST
Created attachment 32673 [details]
ISO-8859-2 chars in bug added via web browser
Comment 11 Tomasz&#346;mietanka CLA 2006-01-09 09:03:59 EST
Created attachment 32674 [details]
ISO-8859-2 chars in bug after posting comment from mylar
Comment 12 David Barri CLA 2006-01-09 22:06:21 EST
I also agree that a property in the bugzilla server preferences that allows the user to override the detected encoding and set it manually would be extremely beneficial. :)
Comment 13 Mik Kersten CLA 2006-01-13 19:41:53 EST
That sounds good to me.  I'll try to get to it next week and leave this report open for the time being.  It will probably be set per repository.

Tomasz, this might not help your issue with posting to Bugzilla 2.18 though, since it's getting sent the right encoding but not honoring it.  This might need to get documented as a 2.18 limitation since I spent considerably time debugging it and couldn't make the server accept the encoding via a form POST.
Comment 14 David Barri CLA 2006-02-27 03:24:04 EST
Although "bugzilla unicode support" is listed in the 0.4.9 whatsnew list, it's still not working properly. Japanese characters in the titles of bugs in bugzilla 2.20 (uncustomized) still do not load correctly. This was working in a dev build that Mik gave me a while ago. *cry*
Comment 15 Mik Kersten CLA 2006-02-27 13:23:04 EST
David, Tomasz, could you please send me the login details of your repositories?  I lost them due to a recent laptop hard drive failure.
Comment 16 David Barri CLA 2006-02-28 08:36:20 EST
Interesting discovery: if I add a query, my Japanese characters do not display in the task list. If I double-click the task, it reloads/syncs with the bugzilla server, opens a window displaying the bug info (with CORRECT Japanese chars) and updates the name of the bug in the mylar task list (again with the correct Japanese chars). Which is great! Mylar can correctly read unicode characters in the bug data but now it just needs to do it correctly when I click "synchronize with repository" on a query.
Comment 17 David Barri CLA 2006-03-14 01:57:10 EST
Another interesting discovery. Correctly getting unicode titles from items only works half the time. It doesn't work for some characters (for example \u306e).
If I click on the "browser" tab at the bottom of the editor though it all displays correctly. Unfortunately though the mylar task list is full of unreadable bad characters :(
Comment 18 David Barri CLA 2006-03-14 01:59:30 EST
Created attachment 36207 [details]
Pic of some unicode chars working and others not

The chars in the "bugzilla" tab are incorrectly displayed as diamonds and sqaures.
Comment 19 Mik Kersten CLA 2006-03-14 10:45:15 EST
David, we hope to get back to this bug this week (possibly next).
Comment 20 Mik Kersten CLA 2006-04-06 20:52:10 EDT
The remaining changes are too disruptive for 0.5.0, so postponing to 0.5.1.  Unfortunately, since this is one of our longest lived bug reports!
Comment 21 Robert Elves CLA 2006-04-12 16:07:19 EDT
Will need to review this after bug# 136219 is resolved.
Comment 22 Mik Kersten CLA 2006-04-12 16:10:43 EDT
The reason is that bug 136219 might resolve the remaining encoding issues for us, since it would prevent us from parsing HTML, and additinoally may reuiqre reworking some of the current encoding support.

Rob: as part of investigating that bug, could you please use one of the repositories linked here to ensure that encoding is preserved in the XML format?
Comment 23 Robert Elves CLA 2006-05-15 19:50:44 EDT
Hi David,

We've completed the xml/rdf conversion and would like to address the encoding issues. It appears as though your bugzilla server is no longer available. Would it be possible to email me (relves@cs.ubc.ca) the latest location and login details so I can run some tests? Many thanks.
Comment 24 Mik Kersten CLA 2006-05-16 12:16:06 EDT
Created attachment 41604 [details]
mylar/context/zip
Comment 25 Robert Elves CLA 2006-05-16 20:33:22 EDT
As suggested previously, a setting may be required to specify the encoding used. Since the encoding doesn't appear to be sent by Bugzilla in the xml, this looks like our only option. This setting will probably need to be per repository.
Comment 26 Mik Kersten CLA 2006-05-16 22:19:49 EDT
Rob, the other thing you could consider is checking the root page of the server and seeing what encoding it returns (e.g. https://bugs.eclipse.org/bugs returns ISO-8859-1), and set the encoding automatically if that turns out to be consistent.
Comment 27 Robert Elves CLA 2006-05-17 20:06:38 EDT
Created attachment 41828 [details]
UTF-8 encoding support screenshot

David, thanks for letting us test using your sever. This screenshot shows the encoding support working. Mik has put a dev build up which includes this fix. Could you try out this dev build so we can be sure to have this encoding issue fixed for Friday's 5.2 release? Dev build update site: download.eclipse.org/technology/mylar/update-site/dev
Comment 28 David Barri CLA 2006-05-17 20:38:48 EDT
Hi. No probs re: test server ;)
I just tried the new dev version (0.5.1.v20060517-1500) and unfortunately it doesn't work at all :~(
I tried with the test server that I shared with use (which is 2.20 out-of-the-box) and I also tried with a different server that I use for work (which is 2.22 out-of-the-box) and it didn't work with either. I also tried in a completely new workspace too.

I'll attach screenshots.
Comment 29 David Barri CLA 2006-05-17 20:40:10 EDT
Created attachment 41830 [details]
Task view with 0.5.1.v20060517-1500
Comment 30 David Barri CLA 2006-05-17 20:40:38 EDT
Created attachment 41831 [details]
Bug view with 0.5.1.v20060517-1500
Comment 31 Mik Kersten CLA 2006-05-17 21:23:53 EDT
David, 0.5.1.v20060517-1500 didn't have the fix, so it looks like you managed to download before the latest build propagated to the servers.  Could you please try updating again (build should be 0.5.1.v20060517-1700) and let us know how it goes?  
Comment 32 David Barri CLA 2006-05-17 21:45:36 EDT
ok, i'm using 0.5.1.v20060517-1700 now but it's barely any different than before. In the task view, I resync'ed and the encoding is still screwed up. Tried with a new query and with new bugs just incase the names were cached. However, if I double click I bug and open the bug view the encoding sort of works. However there are charachers that incorrectly show up with blocks.

Using the test server I set up, look at bug #3. The title is '?? <blah "__" blah>'. The first two characters are \u78BA and \u8A8D. The first character displays correctly but I don't think the second char is being recognized as \u8A8D.
Comment 33 David Barri CLA 2006-05-17 21:52:28 EDT
Also, i've never checked out the mylar source so I don't know the API at all. Could someone plz tell me which function is parsing/processing the raw bytes of the bugzilla feed? I just wanna have a quick look if nobody minds :)
Comment 34 Mik Kersten CLA 2006-05-17 21:57:18 EDT
Created attachment 41838 [details]
mylar/context/zip

David, this context has the processing you're interested in, and if you're using Mylar you can retrieve it using the task list popup menu and see the classes of interest.  If not, take a look at RepositoryReportFactor.populateReport and the charset stuff in BugzillaReportSubmitForm.
Comment 35 Mik Kersten CLA 2006-05-17 22:01:09 EDT
Regarding comment#32, for bug#1 is what you see different than the screenshot Rob attached?  I know that we were having trouble with the characters displaying in the task list and got squares instead, but were thinking that was due to a lack of space since they showed up properly in the editor.  In any case, any hints appreciated (including whether there is any special configuration that needs to be made on the server).  Rob should be able to pick this up again tomorrow.
Comment 36 David Barri CLA 2006-05-17 22:15:54 EDT
Thanks for that info Mik, i'll have a quick look sometime today or later tonight.

Re comment#32 and Rob's screenshot, if I open the same bug it looks the same and works correctly. The thing is, like 80% of unicode characters are being converted successfuly but there are many others that are not. If you have a look in the description that bug (bug #1) You can see the the first and 4th-last characters on the 2nd-last line are incorrect.

No hints about server config however, sorry :) But the actualy hex codes of the characters that I put in the description of bug #1 should be helpful.
Comment 37 Mik Kersten CLA 2006-05-17 22:20:37 EDT
Thanks David, that would be great.  Since some but not all characters are rendering I'm concerned that this could be due to the UTF-8 bugs in Bugzilla 2.20, which they have fixed for 2.22: http://www.bugzilla.org/releases/2.22/new-features.html#utf8

If it's not inconvenient it would be great if you could upgrade that server to 2.22 in case we're debugging around Bugzilla's bugs!
Comment 38 David Barri CLA 2006-05-17 22:29:30 EDT
oh my work bugzilla server is 2.22 and i've testing with that too. Even with 2.22 it's always the same characters that are consistently garbled.
Comment 39 Mik Kersten CLA 2006-05-17 22:33:19 EDT
Oh, ok.  If it's easy for you to update that test server to 2.22 that would be great.  We should create our own alternate encoding test server too, but not clear if we'll have time to do that this week.
Comment 40 David Barri CLA 2006-05-17 22:34:59 EDT
alrighty. I'll get back to u on that one later tonight or tomorrow morning ;)
Comment 41 Mik Kersten CLA 2006-05-17 22:39:10 EDT
Thanks 1M David!  (for that and for keeping up with what's probably our longest running bug report ;)
Comment 42 David Barri CLA 2006-05-17 22:40:22 EDT
lol, np ;)
Comment 43 David Barri CLA 2006-05-17 23:17:40 EDT
lol, i've fixed it (at least as far as I can see).
I will test a bit more, clean things up and submit a patch sometime today.
Comment 44 Mik Kersten CLA 2006-05-17 23:20:00 EDT
Excellent!!  Consider yourself credited in the 0.5.2 New & Noteworthy :)
Comment 45 David Barri CLA 2006-05-18 00:37:33 EDT
Created attachment 41844 [details]
Patch to fix encoding problem

Here you go. This fixes the encoding problem so that unicode text is correct and no longer garbled.

However, there is still a problem. I was looking thru the code and there is no charset assosiated with the repository. The getBug() function (or was it getReport() ?) attempts to auto-detect the encoding but as far as I can tell this doesn't happen during any other operations.

For example, when I add a new query and synchronize it, the encoding is still messed up. You need to double-click each bug so that RepositoryReportFactory.getBug() or whatever updates the attributes with the correct encoding.

I (humbly) suggest you add a encoding attribute to RepositoryConfiguration and then create something along these lines:

  public BufferedReader getReaderInLocalCharset(InputStream in) {
    if (getCharset() != null)
      return new BufferedReader(new InputStreamReader(in, getCharset()));
    else
      return new BufferedReader(new InputStreamReader(in));
  }

and then replace all relevent instances of "new BufferedReader(new InputStreamReader(xxxx))" with "RepositoryConfiguration.getReaderInLocalCharset(xxx)"

That would COMPLETELY resolve the issue. :)
Comment 46 Robert Elves CLA 2006-05-18 13:20:09 EDT
Great! Thanks for the patch David and your suggestions. I'll be adding repository encoding association today.
Comment 47 Robert Elves CLA 2006-05-18 19:51:00 EDT
Okay, the encoding support is in HEAD. David, if you get a moment it would be great if you could try it out again. One question remains concerning submission of reports. Currently they are being sent with the content type selected in the Task Repository config. I'm not 100% sure if this is correct or if it should always be sent in UTF-8. I guess it depends on what happens on Bugzilla's side?? If you have any insight on this I'm all ears. Thanks.
Comment 48 Robert Elves CLA 2006-05-18 19:53:17 EDT
Created attachment 41970 [details]
mylar/context/zip

encoding context
Comment 49 David Barri CLA 2006-05-18 21:54:56 EDT
I just checked out HEAD and spent some time trying it out and as far as I can tell it's working 100% PERFECTLY!!! No encoding problems whatsoever (at least with my UTF-8 repositories). Well done dude!!!! I don't have time today but if I get time over the weekend I will try to setup a bugzilla repos in some other encoding and test it out.
Comment 50 Mik Kersten CLA 2006-05-18 22:21:16 EDT
That is GREAT to hear.  Nice work guys, what a good example of an open source collaborative effort!  It's always best when users step in to help themselves and push us in the process :)

So now it is my great pleasure to finally close this bug report.  David, if you do find any problems with another Bugzilla that you set up please open up a new bug or reopen this one.  I'm marking it as "greatbug" since that keyword applies here (even though we're not a part of Callisto product and as such not participating in the greatbug contest).
Comment 51 David Barri CLA 2006-05-18 22:28:46 EDT
Too right Mik!
Thanks a lot for everyone's hard work! It has certainly paid off. :)
Comment 52 Robert Elves CLA 2006-05-19 12:57:38 EDT
Yes, thanks again David!