Community
Participate
Working Groups
Plugin ignores encoding of Bugzilla pages. in outline, form, task list etc. Also it not using proper encoding when send bugs to bugzilla server, and that make a lot mess when i for example try change resolution of bug.
Could you point me at a public server that uses a different encoding?
i've prepared test installation of bugzilla at http://bugzilla.strodd.net it's using iso-8859-2 (Central European) Encoding.
Dzieki Tomasz, that's very helpful.
*** Bug 121725 has been marked as a duplicate of this bug. ***
Do you have any examples of SWT widgets using a characeter encoding (e.g. showing Polish accents in a list view)? I haven't see this before and had a bit of trouble finding docs on how to do it. The work-around for this is to use the Internal Browser to edit bugs (see Mylar -> Task List preference page), so I'm marking down the severity.
Created attachment 32573 [details] Sample screenshot (excerpt) of unicode in use I've attached a sample screenshot of unicode in use in a list. You don't need to do anything special to use unicode. All java and SWT is uses unicode natively. You just need to make sure that the string is being properly decoded when it gets it from bugzilla.
Tomasz, David, I've made some progress on this using your two servers, but with varying results: - Bug reports should show up with the proper encoding on both servers. I couldn't determine this completely with David's server because I seem to be missing some Japanese fonts. Tomasz's server seems to not always send the content type charset correctly, so if that's not present the Bugzilla Client checks the HTML for this. - Bug posting: seems to work fine using Japanese fonts on David's 2.20 server. Tomasz's 2.18 server seems appears to be ignoring the content type of the post. I've tried to work around that but have not been successful yet. Could I ask you to try this support, and report on what works for you and what doesn't? I just made a development build. Note that it's Eclipse 3.2 only. You can get it from: download.eclipse.org/technology/mylar/update-site/dev
Good news. The only thing that seems not to work is the internal browser. Well done :) Below is my little report. * POSTING A NEW BUG Unicode summary - pass Unicode description - pass Displays correctly in task list - pass Viewing in internal browser - fail (title is ok but only a few chars display correctly in the bug desc) * BUG FROM BUGZILLA SERVER Displays correctly in task list - pass Viewing in internal browser - fail (title is ok but only a few chars display correctly in the bug desc) Viewing in external browser - pass
Ok. I've checked it. Bug Display entered via browser. 1. Display ISO-8859-2 in task list - passed 2. Display ISO-8859-2 in outline - failed ( look at first character in my surname ) 3. Display comments - parialy passed ( the same problem, that, you can see in outline) After Posting new comment: 1. Display ISO-8859-2 in task list - failed. 2. Display ISO-8859-2 in outline - failed. 3. Diplay comments - failed. I've attached two screenshots before add new comment from mylar and after it. Also i've got an idea. What do you think about adding paramter in plugin configutation. something like select widget with encodings. @ <automatic> <UTF-8> <ISO-8859-2> it will allow to force encoding. it should resolve problem with encoding detection in custom bugzilla instalations.
Created attachment 32673 [details] ISO-8859-2 chars in bug added via web browser
Created attachment 32674 [details] ISO-8859-2 chars in bug after posting comment from mylar
I also agree that a property in the bugzilla server preferences that allows the user to override the detected encoding and set it manually would be extremely beneficial. :)
That sounds good to me. I'll try to get to it next week and leave this report open for the time being. It will probably be set per repository. Tomasz, this might not help your issue with posting to Bugzilla 2.18 though, since it's getting sent the right encoding but not honoring it. This might need to get documented as a 2.18 limitation since I spent considerably time debugging it and couldn't make the server accept the encoding via a form POST.
Although "bugzilla unicode support" is listed in the 0.4.9 whatsnew list, it's still not working properly. Japanese characters in the titles of bugs in bugzilla 2.20 (uncustomized) still do not load correctly. This was working in a dev build that Mik gave me a while ago. *cry*
David, Tomasz, could you please send me the login details of your repositories? I lost them due to a recent laptop hard drive failure.
Interesting discovery: if I add a query, my Japanese characters do not display in the task list. If I double-click the task, it reloads/syncs with the bugzilla server, opens a window displaying the bug info (with CORRECT Japanese chars) and updates the name of the bug in the mylar task list (again with the correct Japanese chars). Which is great! Mylar can correctly read unicode characters in the bug data but now it just needs to do it correctly when I click "synchronize with repository" on a query.
Another interesting discovery. Correctly getting unicode titles from items only works half the time. It doesn't work for some characters (for example \u306e). If I click on the "browser" tab at the bottom of the editor though it all displays correctly. Unfortunately though the mylar task list is full of unreadable bad characters :(
Created attachment 36207 [details] Pic of some unicode chars working and others not The chars in the "bugzilla" tab are incorrectly displayed as diamonds and sqaures.
David, we hope to get back to this bug this week (possibly next).
The remaining changes are too disruptive for 0.5.0, so postponing to 0.5.1. Unfortunately, since this is one of our longest lived bug reports!
Will need to review this after bug# 136219 is resolved.
The reason is that bug 136219 might resolve the remaining encoding issues for us, since it would prevent us from parsing HTML, and additinoally may reuiqre reworking some of the current encoding support. Rob: as part of investigating that bug, could you please use one of the repositories linked here to ensure that encoding is preserved in the XML format?
Hi David, We've completed the xml/rdf conversion and would like to address the encoding issues. It appears as though your bugzilla server is no longer available. Would it be possible to email me (relves@cs.ubc.ca) the latest location and login details so I can run some tests? Many thanks.
Created attachment 41604 [details] mylar/context/zip
As suggested previously, a setting may be required to specify the encoding used. Since the encoding doesn't appear to be sent by Bugzilla in the xml, this looks like our only option. This setting will probably need to be per repository.
Rob, the other thing you could consider is checking the root page of the server and seeing what encoding it returns (e.g. https://bugs.eclipse.org/bugs returns ISO-8859-1), and set the encoding automatically if that turns out to be consistent.
Created attachment 41828 [details] UTF-8 encoding support screenshot David, thanks for letting us test using your sever. This screenshot shows the encoding support working. Mik has put a dev build up which includes this fix. Could you try out this dev build so we can be sure to have this encoding issue fixed for Friday's 5.2 release? Dev build update site: download.eclipse.org/technology/mylar/update-site/dev
Hi. No probs re: test server ;) I just tried the new dev version (0.5.1.v20060517-1500) and unfortunately it doesn't work at all :~( I tried with the test server that I shared with use (which is 2.20 out-of-the-box) and I also tried with a different server that I use for work (which is 2.22 out-of-the-box) and it didn't work with either. I also tried in a completely new workspace too. I'll attach screenshots.
Created attachment 41830 [details] Task view with 0.5.1.v20060517-1500
Created attachment 41831 [details] Bug view with 0.5.1.v20060517-1500
David, 0.5.1.v20060517-1500 didn't have the fix, so it looks like you managed to download before the latest build propagated to the servers. Could you please try updating again (build should be 0.5.1.v20060517-1700) and let us know how it goes?
ok, i'm using 0.5.1.v20060517-1700 now but it's barely any different than before. In the task view, I resync'ed and the encoding is still screwed up. Tried with a new query and with new bugs just incase the names were cached. However, if I double click I bug and open the bug view the encoding sort of works. However there are charachers that incorrectly show up with blocks. Using the test server I set up, look at bug #3. The title is '?? <blah "__" blah>'. The first two characters are \u78BA and \u8A8D. The first character displays correctly but I don't think the second char is being recognized as \u8A8D.
Also, i've never checked out the mylar source so I don't know the API at all. Could someone plz tell me which function is parsing/processing the raw bytes of the bugzilla feed? I just wanna have a quick look if nobody minds :)
Created attachment 41838 [details] mylar/context/zip David, this context has the processing you're interested in, and if you're using Mylar you can retrieve it using the task list popup menu and see the classes of interest. If not, take a look at RepositoryReportFactor.populateReport and the charset stuff in BugzillaReportSubmitForm.
Regarding comment#32, for bug#1 is what you see different than the screenshot Rob attached? I know that we were having trouble with the characters displaying in the task list and got squares instead, but were thinking that was due to a lack of space since they showed up properly in the editor. In any case, any hints appreciated (including whether there is any special configuration that needs to be made on the server). Rob should be able to pick this up again tomorrow.
Thanks for that info Mik, i'll have a quick look sometime today or later tonight. Re comment#32 and Rob's screenshot, if I open the same bug it looks the same and works correctly. The thing is, like 80% of unicode characters are being converted successfuly but there are many others that are not. If you have a look in the description that bug (bug #1) You can see the the first and 4th-last characters on the 2nd-last line are incorrect. No hints about server config however, sorry :) But the actualy hex codes of the characters that I put in the description of bug #1 should be helpful.
Thanks David, that would be great. Since some but not all characters are rendering I'm concerned that this could be due to the UTF-8 bugs in Bugzilla 2.20, which they have fixed for 2.22: http://www.bugzilla.org/releases/2.22/new-features.html#utf8 If it's not inconvenient it would be great if you could upgrade that server to 2.22 in case we're debugging around Bugzilla's bugs!
oh my work bugzilla server is 2.22 and i've testing with that too. Even with 2.22 it's always the same characters that are consistently garbled.
Oh, ok. If it's easy for you to update that test server to 2.22 that would be great. We should create our own alternate encoding test server too, but not clear if we'll have time to do that this week.
alrighty. I'll get back to u on that one later tonight or tomorrow morning ;)
Thanks 1M David! (for that and for keeping up with what's probably our longest running bug report ;)
lol, np ;)
lol, i've fixed it (at least as far as I can see). I will test a bit more, clean things up and submit a patch sometime today.
Excellent!! Consider yourself credited in the 0.5.2 New & Noteworthy :)
Created attachment 41844 [details] Patch to fix encoding problem Here you go. This fixes the encoding problem so that unicode text is correct and no longer garbled. However, there is still a problem. I was looking thru the code and there is no charset assosiated with the repository. The getBug() function (or was it getReport() ?) attempts to auto-detect the encoding but as far as I can tell this doesn't happen during any other operations. For example, when I add a new query and synchronize it, the encoding is still messed up. You need to double-click each bug so that RepositoryReportFactory.getBug() or whatever updates the attributes with the correct encoding. I (humbly) suggest you add a encoding attribute to RepositoryConfiguration and then create something along these lines: public BufferedReader getReaderInLocalCharset(InputStream in) { if (getCharset() != null) return new BufferedReader(new InputStreamReader(in, getCharset())); else return new BufferedReader(new InputStreamReader(in)); } and then replace all relevent instances of "new BufferedReader(new InputStreamReader(xxxx))" with "RepositoryConfiguration.getReaderInLocalCharset(xxx)" That would COMPLETELY resolve the issue. :)
Great! Thanks for the patch David and your suggestions. I'll be adding repository encoding association today.
Okay, the encoding support is in HEAD. David, if you get a moment it would be great if you could try it out again. One question remains concerning submission of reports. Currently they are being sent with the content type selected in the Task Repository config. I'm not 100% sure if this is correct or if it should always be sent in UTF-8. I guess it depends on what happens on Bugzilla's side?? If you have any insight on this I'm all ears. Thanks.
Created attachment 41970 [details] mylar/context/zip encoding context
I just checked out HEAD and spent some time trying it out and as far as I can tell it's working 100% PERFECTLY!!! No encoding problems whatsoever (at least with my UTF-8 repositories). Well done dude!!!! I don't have time today but if I get time over the weekend I will try to setup a bugzilla repos in some other encoding and test it out.
That is GREAT to hear. Nice work guys, what a good example of an open source collaborative effort! It's always best when users step in to help themselves and push us in the process :) So now it is my great pleasure to finally close this bug report. David, if you do find any problems with another Bugzilla that you set up please open up a new bug or reopen this one. I'm marking it as "greatbug" since that keyword applies here (even though we're not a part of Callisto product and as such not participating in the greatbug contest).
Too right Mik! Thanks a lot for everyone's hard work! It has certainly paid off. :)
Yes, thanks again David!