Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   stmplib MIMEText charset weirdness (http://www.velocityreviews.com/forums/t958062-stmplib-mimetext-charset-weirdness.html)

Adam W. 02-26-2013 04:00 AM

stmplib MIMEText charset weirdness
 
Can someone explain to me why I can't set the charset after the fact and still have it work.

For example:
>>> text = MIMEText('❤¥'.encode('utf-8'), 'html')
>>> text.set_charset('utf-8')
>>> text.as_string()

Traceback (most recent call last):
File "<pyshell#53>", line 1, in <module>
text.as_string()
File "C:\Python32\lib\email\message.py", line 168, in as_string
g.flatten(self, unixfrom=unixfrom)
File "C:\Python32\lib\email\generator.py", line 91, in flatten
self._write(msg)
File "C:\Python32\lib\email\generator.py", line 137, in _write
self._dispatch(msg)
File "C:\Python32\lib\email\generator.py", line 163, in _dispatch
meth(msg)
File "C:\Python32\lib\email\generator.py", line 192, in _handle_text
raise TypeError('string payload expected: %s' % type(payload))
TypeError: string payload expected: <class 'bytes'>

As opposed to:
>>> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')
>>> text.as_string()

'Content-Type: text/html; charset="utf-8"\nMIME-Version: 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'


Side question:
>>> text = MIMEText('❤¥', 'html')
>>> text.set_charset('utf-8')
>>> text.as_string()

'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type: text/html; charset="utf-8"\n\n❤¥'

Why is it now 8-bit encoding?

Steven D'Aprano 02-26-2013 07:10 AM

Re: stmplib MIMEText charset weirdness
 
On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:

> Can someone explain to me why I can't set the charset after the fact and
> still have it work.
>
> For example:
>>>> text = MIMEText('❤¥'.encode('utf-8'), 'html')



It would help if you tell us where this MIMEText function came from.
Based on the error messages you provide later, I'm going to assume it is
the one in the Python 3.2 email package:

from email.mime.text import MIMEText

The documentation for MIMEText is rather terse, but it implies that the
parameter given should be a string, not bytes:

http://docs.python.org/3.2/library/e....text.MIMEText

If I provide a string, it seems to work fine:


py> msg = '❤¥'
py> blob = MIMEText(msg, _charset='utf-8')
py> blob.as_string()
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: base64

4p2kwqU=



But if I provide bytes, as you do, I get the same error you do:


py> msg_as_bytes = msg.encode('utf-8')
py> print(msg_as_bytes)
b'\xe2\x9d\xa4\xc2\xa5'
py> blob = MIMEText(msg_as_bytes)
py> blob.as_string()
Traceback (most recent call last):
[...]
TypeError: string payload expected: <class 'bytes'>


So it pays to read the error message. It tells you that it expected the
payload should be a string, but was bytes instead.


> As opposed to:
>
>>>> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')
>>>> text.as_string()

> 'Content-Type: text/html; charset="utf-8"\nMIME-Version:
> 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'



My wild guess is that it is an accident (possibly a bug) that the above
works at all. I think it shouldn't; MIMEText is expecting a string, and
you provide a bytes object. The documentation for the email package
states:


[quote]
Here are the major differences between email version 5.0 and version 4:

All operations are on unicode strings. Text inputs must be strings,
text outputs are strings. Outputs are limited to the ASCII character set
and so can be encoded to ASCII for transmission. Inputs are also limited
to ASCII; this is an acknowledged limitation of email 5.0 and means it
can only be used to parse email that is 7bit clean.
[end quote]

http://docs.python.org/3.2/library/email.html



but frankly, I'm not an expert on the email package. It may be that the
behaviour you describe is deliberate.



--
Steven

Adam W. 02-26-2013 03:29 PM

Re: stmplib MIMEText charset weirdness
 
On Tuesday, February 26, 2013 2:10:28 AM UTC-5, Steven D'Aprano wrote:
> On Mon, 25 Feb 2013 20:00:24 -0800, Adam W. wrote:
>
> The documentation for MIMEText is rather terse, but it implies that the
>
> parameter given should be a string, not bytes:
>
>
>
> http://docs.python.org/3.2/library/e....text.MIMEText
>
>
>
> If I provide a string, it seems to work fine:
>
>



Ok, working under the assumption you need to provide it a string, it still leaves the question why adding the header after the fact (to a string input) does not produce the same result as declaring the encoding type inline.


>
> > As opposed to:

>
> >

>
> >>>> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')

>
> >>>> text.as_string()

>
> > 'Content-Type: text/html; charset="utf-8"\nMIME-Version:

>
> > 1.0\nContent-Transfer-Encoding: base64\n\n4p2kwqU=\n'

>
>
>
>
>
> My wild guess is that it is an accident (possibly a bug) that the above
>
> works at all. I think it shouldn't; MIMEText is expecting a string, and
>
> you provide a bytes object. The documentation for the email package
>
> states:
>
>
>
>
>
> [quote]
>
> Here are the major differences between email version 5.0 and version 4:
>
>
>
> All operations are on unicode strings. Text inputs must be strings,
>
> text outputs are strings. Outputs are limited to the ASCII character set
>
> and so can be encoded to ASCII for transmission. Inputs are also limited
>
> to ASCII; this is an acknowledged limitation of email 5.0 and means it
>
> can only be used to parse email that is 7bit clean.
>
> [end quote]
>
>
>
> http://docs.python.org/3.2/library/email.html
>


I find this limitation hard to believe, why bother with encoding flags if it can only ever accept ASCII anyway?

The reason this issue came up was because I was adding the header after like in my examples and it wasn't working, so I Google'd around and found thisStackoverflow: http://stackoverflow.com/questions/1...-in-python-2-7

Which seemed to be doing exactly what I wanted, with the only difference isthe inline deceleration of utf-8, with that change it started working as desired...

Terry Reedy 02-26-2013 07:46 PM

Re: stmplib MIMEText charset weirdness
 
On 2/25/2013 11:00 PM, Adam W. wrote:
> Can someone explain to me why I can't set the charset after the fact.


Email was revised to v.6 for 3.3, so the immediate answer to both your
why questions is 'because email was not revised yet'.

> text = MIMEText('❤¥'.encode('utf-8'), 'html')


In 3.3 this fails immediately with
AttributeError: 'bytes' object has no attribute 'encode'
because when _charset is not given, MIMEText.__init__ test encodes to
discover what it should be
if _charset is None:
try:
_text.encode('us-ascii')
_charset = 'us-ascii'
except UnicodeEncodeError:
_charset = 'utf-8'

> text = MIMEText('❤¥'.encode('utf-8'), 'html', 'utf-8')


If one provides bytes, one must provide the charset and MIMEText assumes
you are not lying.

> text.as_string()
> Content-Type: text/html; charset="utf-8"
> MIME-Version: 1.0
> Content-Transfer-Encoding: base64
>
> 4p2kwqU=


> Side question:
> text = MIMEText('❤¥', 'html')
> text.set_charset('utf-8')


This is redundant here. This method is inherited from Message and
appears pretty useless for the subclass.

> text.as_string()
> 'MIME-Version: 1.0\nContent-Transfer-Encoding: 8bit\nContent-Type:
> text/html;charset="utf-8"\n\n❤¥'
>
> Why is it now 8-bit encoding?


Bug fixed in 3.3. Output now same as above. Use 3.3 for email unless you
cannot due to other dependencies not yet being available.

--
Terry Jan Reedy




All times are GMT. The time now is 04:29 PM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.