Jarek Zgoda wrote:
> Fredrik Lundh napisa³(a):
>
> >>UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> >>unsupported Unicode code range
> >>
> >>does anyone have any idea on what could be going wrong? The string
> >>that I store in the database table is:
> >>
> >>'Keinen Text für Übereinstimmungsfehler gefunden'
> >
> > $ more test.py
> > # -*- coding: iso-8859-1 -*-
> > u = u'Keinen Text für Übereinstimmungsfehler gefunden'
> > s = u.encode("iso-8859-1")
> > u = s.decode("utf-8") # <-- this gives an error
> >
> > $ python test.py
> > Traceback (most recent call last):
> > File "test.py", line 4, in ?
> > u = s.decode("utf-8") # <-- this gives an error
> > File "lib/encodings/utf_8.py", line 16, in decode
> > return codecs.utf_8_decode(input, errors, True)
> > UnicodeDecodeError: 'utf8' codec can't decode bytes in position 13-18:
> > unsupported Unicode code range
>
> I cann't wait for the moment when encoded strings go away from Python.
> The more I program in this language, the more confusion this difference
> is causing. Now most of functions and various object's methods accept
> strings and unicode, making it hard to find sources of Unicode*Errors.
Library writers can speed up the transition by hiding 8bit interface,
for example:
import sqlite
sqlite.I_promise_to_pass_8bit_string_only_in_utf8_ encoding(my_signature="sig.gif")
if you don't call this function 8bit strings will not be accepted

IMHO if libraries keep on excepting both str and unicode till python
3.0, it will just prolong the confusion of unicode newbies instead of
guiding them in the right direction _right now_.