Bug 242301 - UTF-8 encoding lost between mySQL and PHP on windows
Summary: UTF-8 encoding lost between mySQL and PHP on windows
Status: RESOLVED WONTFIX
Alias: None
Product: Babel
Classification: Technology
Component: Server (show other bugs)
Version: unspecified   Edit
Hardware: PC Windows XP
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Babel server inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: contributed
Depends on:
Blocks:
 
Reported: 2008-07-28 16:38 EDT by Matthew Mazaika CLA
Modified: 2008-11-12 15:03 EST (History)
0 users

See Also:


Attachments
patch to change connection stream to utf-8 (780 bytes, patch)
2008-07-28 16:49 EDT, Matthew Mazaika CLA
denis.roy: review-
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Matthew Mazaika CLA 2008-07-28 16:38:42 EDT
on windows, PHP 5 doesn't seem to determine the correct character encoding to the mysql database - it pulls data back in "latin1" instead of "utf8"
Comment 1 Matthew Mazaika CLA 2008-07-28 16:49:08 EDT
Created attachment 108572 [details]
patch to change connection stream to utf-8

i think the best solution is to change the connection type to utf8 immediately after connecting to the DB

adding this line right after the connection is made has fixed the problem:

mysql_query("SET NAMES 'utf8'");


since we should always be using UTF-8 for this project, even across the different linux distributions, this solution will enforce that 

My development environment is set up as follows:
Windows XP/SP2
IIS 5.1
PHP 5.2.5 (cli) (built: Nov  8 2007 23:18:51)
mysql  Ver 14.12 Distrib 5.0.51b, for Win32 (ia32)
Comment 2 Denis Roy CLA 2008-07-29 09:16:09 EDT
Thanks for the patch.  It's good to see people getting the environment set up on Windows.

I'll have a look at the implications of the SET NAMES command on our test environment.  I don't think it will affect anything, but I need to make sure I can still build the NL plugins and import translations correctly.
Comment 3 Denis Roy CLA 2008-07-29 15:20:25 EDT
Using this patch causes severe breakage on Linux.

The DB servers I use are all set up for the default latin1 setting, as shown by the status command:

Server characterset:    latin1
Db     characterset:    latin1
Client characterset:    latin1


The show table status command for the Babel database does indicate that all the tables that need UTF8 are 'utf8_general_ci'.  Do you have the same on Windows?
Comment 4 Matthew Mazaika CLA 2008-07-29 15:54:29 EDT
I'm not sure how to display the list of charactersets that you have, but i ran this query, and believe that my database is completely in UTF-8:

mysql> show create database `babel`;
+----------+----------------------------------------------------------------+
| Database | Create Database                                                |
+----------+----------------------------------------------------------------+
| babel    | CREATE DATABASE `babel` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+----------------------------------------------------------------+
1 row in set (0.00 sec)



Since your database is not completely UTF-8, it would explain the major breakdown when you force the data to be read as "utf8" when under the hood it is actually still "latin1".

Although it is currently working, I wonder if the database should have originally been created as "utf8" rather than "latin1".  Unfortunately I don't have much domain knowledge here and I'm not sure what the consequences are of going one way or the other.
Comment 5 Denis Roy CLA 2008-07-29 17:16:34 EDT
> Although it is currently working, I wonder if the database should have
> originally been created as "utf8" rather than "latin1".

Character sets are properties of the storage (which is a function of a table, not the database) and the communication between the client and the server, so you can't 'create a database as utf8'.  Granted, using 'utf8' for all the tables would have been nice, but the live babel server needs to interact with Bugzilla tables, which are latin1, so I didn't have that luxury (without going through character conversion, which can get ugly, and everything works as-is for the live server, so I have no need to do that).


At this point, I suspect the Windows versions of PHP and/or MySQL don't interact with each other the same way they do on Linux, and I have modified our Wiki documentation to highlight this (and link back to your patch for the fix). Perhaps this could also be one of those WINDOWS_ENVIRONMENT hacks we put together (related to bug 242011).
Comment 6 Denis Roy CLA 2008-11-12 15:01:31 EST
Comment on attachment 108572 [details]
patch to change connection stream to utf-8

Just doing some triage -- can't apply this patch as it causes breakage on Linux
Comment 7 Denis Roy CLA 2008-11-12 15:03:38 EST
I'll close this as WONTFIX as the team doesn't currently have the intention (or the resources) to actively support the Windows environment.  The patch is listed on our Babel Server docs under "running on Windows".