Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   Re: Python3.3 str() bug? (http://www.velocityreviews.com/forums/t954362-re-python3-3-str-bug.html)

Stefan Behnel 11-09-2012 09:37 AM

Re: Python3.3 str() bug?
 
Helmut Jarausch, 09.11.2012 10:18:
> probably I'm missing something.
>
> Using str(Arg) works just fine if Arg is a list.
> But
> str([],encoding='latin-1')
>
> gives the error
> TypeError: coercing to str: need bytes, bytearray or buffer-like object,
> list found
>
> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
> Do I need to flatten any data structure which is normally excepted by str() ?


Funny idea to call this a bug in Python. What your code is asking for is to
decode the object you pass in using the "latin-1" encoding. Since a list is
not something that is "encoded", let alone in latin-1, you get an error,
and actually a rather clear one.

Note that this is not specific to Python3.3 or even 3.x. It's the same
thing in Py2 when you call the equivalent unicode() function.

Stefan



Chris Angelico 11-09-2012 12:22 PM

Re: Python3.3 str() bug?
 
On Fri, Nov 9, 2012 at 10:08 PM, Helmut Jarausch
<jarausch@igpm.rwth-aachen.de> wrote:
> For me it's not funny, at all.


His description "funny" was in reference to the fact that you
described this as a bug. This is a heavily-used mature language; bugs
as fundamental as you imply are unlikely to exist (consequences of
design decisions there will be, but not outright bugs, usually);
extraordinary claims require extraordinary evidence.

> Whenever Python3 encounters a bytestring it needs an encoding to convert it to
> a string. If I feed a list of bytestrings or a list of list of bytestrings to
> 'str' , etc, it should use the encoding for each bytestring component of the
> given data structure.
>
> How can I convert a data strucure of arbitrarily complex nature, which contains
> bytestrings somewhere, to a string?


Okay, now we're getting somewhere.

What you really should be doing is not transforming the whole
structure, but explicitly transforming each part inside it. I
recommend you stop fighting the language and start thinking about your
data as either *bytes* or *characters* and using the appropriate data
types (bytes or str) everywhere. You'll then find that it makes
perfect sense to explicitly translate (en/decode) from one to another,
but it doesn't make sense to encode a list in UTF-8 or decode a
dictionary from Latin-1.

> This problem has arisen while converting a working Python2 script to Python3.3.
> Since Python2 doesn't have bytestrings it just works.


Actually it does; it just calls them "str". And there's a Unicode
string type, called "unicode", which is (more or less) the thing that
Python 3 calls "str".

You may be able to do some kind of recursive cast that, in one sweep
of your data structure, encodes all str objects into bytes using a
given encoding (or the reverse thereof). But I don't think this is the
best way to do things.

ChrisA

Stefan Behnel 11-09-2012 05:07 PM

Re: Python3.3 str() bug?
 
Helmut Jarausch, 09.11.2012 14:13:
> On Fri, 09 Nov 2012 23:22:04 +1100, Chris Angelico wrote:
>> What you really should be doing is not transforming the whole
>> structure, but explicitly transforming each part inside it. I
>> recommend you stop fighting the language and start thinking about your
>> data as either *bytes* or *characters* and using the appropriate data
>> types (bytes or str) everywhere. You'll then find that it makes
>> perfect sense to explicitly translate (en/decode) from one to another,
>> but it doesn't make sense to encode a list in UTF-8 or decode a
>> dictionary from Latin-1.
>>
>>> This problem has arisen while converting a working Python2 script to Python3.3.
>>> Since Python2 doesn't have bytestrings it just works.

>>
>> Actually it does; it just calls them "str". And there's a Unicode
>> string type, called "unicode", which is (more or less) the thing that
>> Python 3 calls "str".
>>
>> You may be able to do some kind of recursive cast that, in one sweep
>> of your data structure, encodes all str objects into bytes using a
>> given encoding (or the reverse thereof). But I don't think this is the
>> best way to do things.

>
> Thanks, but in my case the (complex) object is returned via ctypes from the
> aspell library.
> I still think that a standard function in Python3 which is able to 'stringify'
> objects should take an encoding parameter.


And how would that work? Would it recursively run through all data
structures you pass in or stop at some level or at some type of object?
Would it simply concatenate the substrings (and with what separator?), or
does the chaining depend on the objects found? Should it use the same
separator for everything or different separators for each level of the data
structure? Should it use str() for everything or repr() for some? Is str()
the right thing or are there special objects that need more than just a
call to str(), some kind of further preprocessing?

There are so many ways to do something like this, and it's so straight
forward to do in a given use case, that it's IMHO useless to even think
about adding a "general solution" for this to the stdlib.

Stefan



Prasad, Ramit 11-09-2012 05:47 PM

RE: Python3.3 str() bug?
 
Chris Angelico wrote:

>
> What you really should be doing is nottransforming the whole
> structure, but explicitly transforming each part inside it. I
> recommend you stop fighting the language and start thinking about your
> data as either *bytes* or *characters* and using the appropriate data
> types (bytes or str) everywhere. You'll then find that it makes
> perfect sense to explicitly translate (en/decode) from one to another,
> but it doesn't make sense to encode a listin UTF-8 or decode a
> dictionary from Latin-1.
>

[snip]

>
> You may be able to do some kind of recursive cast that, in one sweep
> of your data structure, encodes all str objects into bytesusing a
> given encoding (or the reverse thereof). But I don't think this is the
> best way to do things.


I would think the best way is to convert as you load the data.
That way everything is in the correct format as you manipulate
and generate new data.


~Ramit


This email is confidential and subjectto important disclaimers and
conditions including on offers for the purchase or sale of
securities, accuracy and completeness of information, viruses,
confidentiality, legal privilege, and legal entity disclaimers,
available at http://www.jpmorgan.com/pages/disclosures/email.

Terry Reedy 11-09-2012 11:35 PM

Re: Python3.3 str() bug?
 
On 11/9/2012 8:13 AM, Helmut Jarausch wrote:

> Just for the record.
> I first discovered a real bug with Python3 when using os.walk on a file system
> containing non-ascii characters in file names.
>
> I encountered a very strange behavior (I still would call it a bug) when trying
> to put non-ascii characters in email headers.
> This has only been solved satisfactorily in Python3.3.


Most bugs, such as the above, are in library modules. There have been
many related to unicode. In my opinion, 3.3 is the first version to
handle unicode decently well.

>>> How can I convert a data strucure of arbitrarily complex nature, which contains
>>> bytestrings somewhere, to a string?


> Thanks, but in my case the (complex) object is returned via ctypes from the
> aspell library.
> I still think that a standard function in Python3 which is able to 'stringify'
> objects should take an encoding parameter.


This is an interesting idea, which I have not seen before. It is more
sensible in Python 3 than in Python 2. (For py2, unicode(str(object),
encoding='xxx') does what you want.) Try presenting it here or on
python-ideas as an enhancement request, rather than as a bug report ;-).

In the meanwhile, if you cannot have the object constructed with strings
rather than bytes, I suggest you write a custom converter function that
understands the structure and replaces bytes with strings.

--
Terry Jan Reedy


Oscar Benjamin 11-10-2012 04:45 PM

Re: Python3.3 str() bug?
 
On 9 November 2012 11:08, Helmut Jarausch <jarausch@igpm.rwth-aachen.de> wrote:
> On Fri, 09 Nov 2012 10:37:11 +0100, Stefan Behnel wrote:
>
>> Helmut Jarausch, 09.11.2012 10:18:
>>> probably I'm missing something.
>>>
>>> Using str(Arg) works just fine if Arg is a list.
>>> But
>>> str([],encoding='latin-1')
>>>
>>> gives the error
>>> TypeError: coercing to str: need bytes, bytearray or buffer-like object,
>>> list found
>>>
>>> If this isn't a bug how can I use str(Arg,encoding='latin-1') in general.
>>> Do I need to flatten any data structure which is normally excepted by str() ?

>>
>> Funny idea to call this a bug in Python. What your code is asking for is to
>> decode the object you pass in using the "latin-1" encoding. Since a list is
>> not something that is "encoded", let alone in latin-1, you get an error,
>> and actually a rather clear one.
>>
>> Note that this is not specific to Python3.3 or even 3.x. It's the same
>> thing in Py2 when you call the equivalent unicode() function.
>>

>
> For me it's not funny, at all.


I think the problem is that the str constructor does two fundamentally
different things depending on whether you have supplied the encoding
argument. From help(str) in Python 3.2:

| str(object[, encoding[, errors]]) -> str
|
| Create a new string object from the given object. If encoding or
| errors is specified, then the object must expose a data buffer
| that will be decoded using the given encoding and error handler.
| Otherwise, returns the result of object.__str__() (if defined)
| or repr(object).
| encoding defaults to sys.getdefaultencoding().
| errors defaults to 'strict'.

So str(obj) returns obj.__str__() but str(obj, encoding='xxx') decodes
a byte string (or a similar object) using a given encoding. In most
cases obj will be a byte string and it will be equivalent to using
obj.decode('xxx').

I think the help text is a little confusing. It says that encoding
defaults to sys.getdefaultencoding() but doesn't clarify but this only
applies if errors is given as a keyword argument since otherwise no
decoding is performed. Perhaps the help text would be clearer if it
listed the two operations as two separate cases e.g.:

str(object)
Returns a string object from object.__str__() if it is defined or
otherwise object.__repr__(). Raises TypeError if the returned result
is not a string object.

str(bytes, [encoding[, errors]])
If either encoding or errors is supplied, creates a new string
object by decoding bytes with the specified encoding. The bytes
argument can be any object that supports the buffer interface.
encoding defaults to sys.getdefaultencoding() and errors defaults to
'strict'.

> Whenever Python3 encounters a bytestring it needs an encoding to convert it to
> a string.


Well actually Python 3.3 will happily convert it to a string using
bytes.__repr__ if you don't supply the encoding argument:

>>> str(b'this is a byte string')

"b'this is a byte string'"

> If I feed a list of bytestrings or a list of list of bytestrings to
> 'str' , etc, it should use the encoding for each bytestring component of the
> given data structure.


You can always do:

[str(obj, encoding='xxx') for obj in list_of_byte_strings]

> How can I convert a data strucure of arbitrarily complex nature, which contains
> bytestrings somewhere, to a string?


Using str(obj) or repr(obj). Of course this relies on the author of
type(obj) defining the appropriate methods and writing the code that
actually converts the object into a string.

> This problem has arisen while converting a working Python2 script to Python3.3.
> Since Python2 doesn't have bytestrings it just works.


In Python 2 ordinary strings are byte strings.

> Tell me how to convert str(obj) from Python2 to Python3 if obj is an
> arbitrarily complex data structure containing bytestrings somewhere
> which have to be converted to strings with a given encoding?


The str function when used to convert a non-string object into a
string knows nothing about the object you provide except whether it
has __str__ or __repr__ methods. The only processing that is done is
to check that the returned result was actually a string:

>>> class A:

.... def __str__(self):
.... return []
....
>>> a = A()
>>> str(a)

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __str__ returned non-string (type list)

Perhaps it would help if you would explain why you want the string
object. I would only use str(complex_object) as something to print for
debugging so I would actually want it to show me which strings were
byte strings by marking them with a 'b' prefix and I would also want
it to show non-ascii characters with a \x hex code as it already does:

>>> a = [1, 2, b'caf\xe9']
>>> str(a)

"[1, 2, b'caf\\xe9']"

If I wanted to convert the object to a string in order to e.g. save it
to a file or database then I would write a function to create the
string that I wanted. I would only use str() to convert elementary
types like int and float into strings.


Oscar


All times are GMT. The time now is 02:36 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.