Hans Müller 12-03-2009 03:25 PM

Strange unicode / no unicode phenomen with mysql
I have a strange unicode problem with mySQL and sqlite.

In my application I get a table as a sqlite table which is being compared to an existing mySQL Table.

The sqlite drive returns all strings from the table as a unicode string which is Ok.
The mysql drive returns all strings as utf-8 coded strings (no unicode!).

When opening the mySQL database, use unicode is set to true, so the driver should return
unicode strings.

Any ideas ?

This is the mySQL table definition:
`NAME` varchar(256) COLLATE utf8_bin NOT NULL,
`ID` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
) ENGINE=MyISAM AUTO_INCREMENT=59325 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='Table for mapping user names to IDs'

The sqlite Table was created this way:

sq3Cursor.execute("create table USERNAMES(NAME text, ID integer)")

When I query a value from both tables I get:

>>> SrcCursor.execute("select * from USERNAMES where ID=49011")

<sqlite3.Cursor object at 0x2b6096bfc240>
>>> SrcCursor.fetchone()

(u'J\xd6RG R\xd6\xdfMANN', 49011)
>>> print u'J\xd6RG R\xd6\xdfMANN'.encode("utf8")


This is Ok.

Now mysql:

>>> DstCursor.execute("select * from USERNAMES where ID=49011")

>>> DstCursor.fetchone()

('J\xc3\x96RG R\xc3\x96\xc3\x9fMANN', 49011)
This is the same result, but returned as a utf-8 coded string, not unicode
>>> 'J\xc3\x96RG R\xc3\x96\xc3\x9fMANN'.decode("utf8")

u'J\xd6RG R\xd6\xdfMANN'

The mySQL database has been opened this way:

DstCon = MySQLdb.connect(host = DstServer, user = config["DBUser"], passwd = config["DBPasswd"], db = DstDBName, use_unicode = True, charset = "utf8")
DstCursor = DstCon.cursor()

Since use_unicode is set to True, I expect query results to be unicode (for string data types).

Trying another table,
the result for a query is as aspected a unicode string.

Hans Müller 12-03-2009 10:06 PM

Re: Strange unicode / no unicode phenomen with mysql
I found the bug, it's in the mysql module.

When a column has COLLATE=utf8_bin set,

the column is NOT returned as unicode.

It's a known bug #541198

Thanks all for reading.



