Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Ruby > MySql+UTF8 woes

Thread Tools

MySql+UTF8 woes

Ronald Fischer
Posts: n/a
I am experiencing a strange problem when retrieving Unicode values from
my MySql5 database on Windows. Here is what I have:

* All programs/modules in my application require in the beginning a file =

require 'jcode'

in the beginning, ensuring that "everything is done in Unicode".

* I checked that the MySql database is "as UTF-8 as possible", in
- in the table editor of the MySql Administrator, the character set
for the rows is listed as utf8 (with sorting order utf8_general_ci)
- The "start variables" tab of the Administrator says utf8 for the =
character set
- mysql.ini says=20
- On the mysql command line, I see

mysql> SHOW VARIABLES LIKE 'character\_set\_%';
| Variable_name | Value |
| character_set_client | utf8 |=20
| character_set_connection | utf8 |=20
| character_set_database | utf8 |=20
| character_set_filesystem | binary |=20
| character_set_results | utf8 |=20
| character_set_server | utf8 |=20
| character_set_system | utf8 |=20

Indeed, the following works well:

When I store (from Ruby, via 'mysql' gem) a UTF-8 string
into the database, and later retrieve it, I get back exactly
what I had put in. This works equally well, whether I type
(using my UTF-8 enabled text editor) the character (for=20
example, a German umlaut) directly into the Ruby string,=20
which I then store, i.e. (table X having only one column)

sqlstatement=3D"INSERT INTO X VALUES ('=F6')"

or whether I write the code point in hexadecimal:
uml_o=3D"\xc3\xb6" # =F6, i.e. German umlaut-o in UTF-8 encoding
sqlstatement=3D"INSERT INTO X VALUES ('#{uml_o}')"

In both cases, when I retrieve the rows with=20
dbhandle.query("SELECT * FROM X")
I eventually get back the row with my "=F6" as a UTF-8=20
2-Byte-sequence C3 B6.

So far OK. But there are several things which do not work:

(1) When I look at the database table using the SQL Administrator,
I do not see the letter '=F6', but instead the characters =
to the Windows ANSI code sequence for C3 and B6 characters.

(2) When I manually enter data into the database using MySql
Administrator, by submitting the SQL statement
the =F6 is displayed fine when looking at the table with a SELECT
using the Administrator, but when I read the table from Ruby, I do
not get back a 2 byte UTF-8 sequence for the umlaut, but only one=20
byte with hex code F6. Note that this F6 corresponds to the ANSI=20
code for =F6 in Windows.

(3) When I put the unicode data into a file, and use a Java application
to read the file and write the data into the database (using JDBC), =
I look at the database then with MySql Administrator, I can see the=20
German characters; and when I then read them back with my Ruby=20
application, I get it encoded in Windows ANSI, not Unicode.

Observations (1) and (2) could be explained in the way that maybe the=20
MySql administrator interface was not written with UTF-8 in minde. What
surprises me more is that when writing data using Java (which is =
Unicode aware) and reading it back with Ruby, causes somewhere a =
from UTF-8 into Windows ANSI.=20

Does someone have experienced a similar problem and can point out a =
reason for it?

Ronald Fischer <(E-Mail Removed)>
Phone: +49-89-452133-162

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Network Bridge woes Broady Wireless Networking 0 05-12-2005 11:15 AM
Wireless Connection Woes Frenchy Wireless Networking 4 11-04-2004 04:45 PM
adhoc network 11g wireless dropout woes dkh Wireless Networking 1 08-02-2004 01:27 PM
MN-700 and HP Pavilion woes shasha Wireless Networking 0 07-31-2004 10:12 AM
java woes (win98se) and mozilla Tendril Firefox 1 10-19-2003 04:04 PM