Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: 'ascii' codec can't encode character u'\xf3' (http://www.velocityreviews.com/forums/t334497-re-ascii-codec-cant-encode-character-u-xf3.html)

oziko 08-17-2004 03:34 PM

Re: 'ascii' codec can't encode character u'\xf3'
 
I solve the problem using

print str.encode('iso-8859-1')

Now I can print the tags with no aparent problem. But now whe I tried to
insert that value into a PostgreSQL data base I get the same error. I
create the PostgreSQL database with default Unicode with

createdb -E UNICODE oggtest

The data T am putting into de database si in the u'Perfeccion' format so
I understand it is UNICODE, but I get the same error:

Traceback (most recent call last):
File "./ogg2sql.py", line 82, in ?
db_cursor.execute(do)
File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 3035,
in execute
_qstr = self.__unicodeConvert(_qstr)
File "/usr/lib/python2.3/site-packages/pyPgSQL/PgSQL.py", line 2740,
in __unicodeConvert
return obj.encode(*self.conn.client_encoding)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in
position 102: ordinal not in range(128)


my insert query is:

tracks_insert_values =(unicode(coments['TITLE']),coments['TRACKNUMBER'])

y also tried with:

tracks_insert_values=(coments['TITLE'].encode('utf-8'),coments['TRACKNUMBER'])

insert_query = '''insert into tracks(titulo,no_pista)values(%s %i)''' %
tracks_insert_values




Martin Slouf wrote:
> i had similar errors:
>
> Traceback (most recent call last):
> File "/home/martin/skripty/accounts.py", line 125, in ?
> main(sys.argv)
> File "/home/martin/skripty/accounts.py", line 119, in main
> print_accounts(accounts, url_part)
> File "/home/martin/skripty/accounts.py", line 94, in print_accounts
> print str(i).encode("utf-8", "replace")
> UnicodeEncodeError: 'ascii' codec can't encode characters in position
> 151-152: ordinal not in range(128)
>
> - - - -
>
> the solution seems to be:
>
> 0. string is not in unicode encoding (assumption)
> 1. before printing out, convert the string to unicode
> 2. when printing, convert to whatever charset you like
>
> though i dont understand much why (ive solved it a minute ago :) the
> code should be:
>
> str = "any nonunicode string"
> print unicode(str).encode("iso-8859-2", "replace")
>
> comments:
>
> 1. why the string is not in unicode can have several reasons -- i guess:
> - does ogg stores tags in unicode?
> - you have parsed an xml file with encoding attribute set (that
> is what i do)
> - etc
>
> 2. "replace" parameter in encode causes non-printable chars to be
> replaced with '?' (you can use "ignore" or strict", see your python
> doc)
>
> 3. the above will work _only_ _if_ the 'str' encoding is "iso-8859-2" --
> a funny thing -- first line of code converts from unknown (but the
> programmer must know it) to unicode and the second one converts it back
> from unicode to unknown (now the programmer tells that secret to python
> :)
>
> 4. i would like to know from any python expert whether/why/why not:
>
> * my assumptions are right
>
> * why is that behaviour? -- if you search google you get
> thousands of errors like this -- with no proper solutions i must add
>
> * is there an easier portable way (no sitecustomize.py changes)
> to do it
>
> * i was looking in site.py and there is deleted the
> sys.setdefaultencoding() function, but from the comments i do
> not know why -- you know it? why is user not allowed to change the
> default encoding? it seems reasonable to me if he/she could do that.
>
> thx.
>
> m.
>



Diez B. Roggisch 08-17-2004 04:01 PM

Re: 'ascii' codec can't encode character u'\xf3'
 
oziko wrote:

> Now I can print the tags with no aparent problem. But now whe I tried to
> insert that value into a PostgreSQL data base I get the same error. I
> create the PostgreSQL database with default Unicode with


There seems to be a general misunderstanding about what unicode, an encoding
and all that together in python means.

Unicode is only an abstract definition of character-sets - the usual
suspects like what is in ascii, but also nearly everything somebody on this
planet of ours cares to write down once in a while.

Now an actual encoding is how these totally abstract character sets are
mapped to actual values. So for the capital letter "A", the ascii encoding
maps it to the well known value 65.

BUT: You can define another encoding, call it oziko or whatever, and map "A"
to 1 - if you like it.

Now UTF-8 is also only an encoding - with the capability to map most of
ascii on the usual numbers where you expect them, and a few escape chars
that allow for multi-byte seqhences to appear in the text that encode one
character. So it can encode the whole unicode set, on the price of not
beeing able to determine the length of a string by dividing the number of
bytes it contains it by the number of bytes a character uses - usual one.

So this is an extremely important lesson: unicode is _not_ - I repeat, _not_
- UTF-8.

Now python has unicode objects. They are sequences of characters - what
shape these internally have is opaque to you and not of your concern. They
are _not_ strings!!!! strings in python are sequences of bytes - as we are
used to from C.

Now whenever you want to use a string that is encoded in a special encoding,
you can get it from a unicode-object by invoking encode on it. Thats what

u.encode('iso-8859-1')

does, if s is a unicode object.

The other way round, if you have a byte-sequence - conveniently stored in a
string - and want to get a unicode object from it, use decode

s.decode('iso-8859-1')

Now if you pass a unicode object to a function that wants a _string_, python
applies for you an automatic encode - with the default encoding!!!! As this
is usually ascii, you get the problems you had.

So what do you need to solve your problem at hand? You need to know which
encoding the sql driver wants for transmitting strings - most probably
utf-8, so they can encode all possible characters. And thus you have to
encode tthe strings you pass beforehand, or set the default encoding
properly.

The last thing is to explain where the u''-thingies fit in. They are a
shortcut for getting a unicode object - whatever characters are encountered
inside the u'', is interpreted with the encoding the python interpreter
uses to parse file at hand. Which one that is can either be specified
implicit (system settings) or explicit using the


-*- coding: <codec> -*-

line on top of the source file.

You might want to start reading about unicode and python on the net, google
is as always your friend.

--
Regards,

Diez B. Roggisch

Diez B. Roggisch 08-17-2004 04:08 PM

Re: 'ascii' codec can't encode character u'\xf3'
 
> So what do you need to solve your problem at hand? You need to know which
> encoding the sql driver wants for transmitting strings - most probably
> utf-8, so they can encode all possible characters. And thus you have to
> encode tthe strings you pass beforehand, or set the default encoding
> properly.


Just saw that setting the encoding doesn't work - sorry for suggesting it.
--
Regards,

Diez B. Roggisch


All times are GMT. The time now is 03:07 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.