Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Problems with email.Generator.Generator

Reply
Thread Tools

Problems with email.Generator.Generator

 
 
Chris Withers
Guest
Posts: n/a
 
      09-11-2006
Hi All,

The following piece of code is giving me issues:

from email.Charset import Charset,QP
from email.MIMEText import MIMEText
charset = Charset('utf-8')
charset.body_encoding = QP
msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
'plain',
)
msg.set_charset(charset)
print msg.as_string()

Under Python 2.4.2, this produces the following output, as I'd expect:

MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset="utf-8"

Some text with chars that need encoding: =A3

However, under Python 2.4.3, I now get:

Traceback (most recent call last):
File "test_encoding.py", line 14, in ?
msg.as_string()
File "c:\python24\lib\email\Message.py", line 129,
in
as_string
g.flatten(self, unixfrom=unixfrom)
File "c:\python24\lib\email\Generator.py", line 82,
in flatten
self._write(msg)
File "c:\python24\lib\email\Generator.py", line 113,
in _write
self._dispatch(msg)
File "c:\python24\lib\email\Generator.py", line 139,
in
_dispatch
meth(msg)
File "c:\python24\lib\email\Generator.py", line 182,
in
_handle_text
self._fp.write(payload)
UnicodeEncodeError: 'ascii' codec can't encode
character
u'\xa3' in position 41:
ordinal not in range(12

This seems to be as a result of this change:

http://svn.python.org/view/python/br...37910&r2=42272

....which is referred to as part of a fix for this bug:

http://sourceforge.net/tracker/?func...70&atid=105470

Now, is this change to Generator.py in error or am I doing something wrong?

If the latter, how can I change my code such that it works as I'd expect?

cheers,

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
 
Reply With Quote
 
 
 
 
Manlio Perillo
Guest
Posts: n/a
 
      09-11-2006
Chris Withers ha scritto:
> Hi All,
>
> The following piece of code is giving me issues:
>
> from email.Charset import Charset,QP
> from email.MIMEText import MIMEText
> charset = Charset('utf-8')
> charset.body_encoding = QP
> msg = MIMEText(
> u'Some text with chars that need encoding: \xa3',
> 'plain',
> )
> msg.set_charset(charset)
> print msg.as_string()
>
> Under Python 2.4.2, this produces the following output, as I'd expect:
>


> [...]
> However, under Python 2.4.3, I now get:
>


Try with:

msg = MIMEText(
u'Some text with chars that need encoding: \xa3',
_charset='utf-8',
)


and you will obtain the error:


Traceback (most recent call last):
File "<pyshell#4>", line 3, in -toplevel-
_charset='utf-8',
File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__
self.set_payload(_text, _charset)
File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload
self.set_charset(charset)
File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset
self._payload = charset.body_encode(self._payload)
File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode
return email.base64MIME.body_encode(s)
File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode
enc = b2a_base64(s[i:i + max_unencoded])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
position 41: ordinal not in range(12




Regards Manlio Perillo
 
Reply With Quote
 
 
 
 
Chris Withers
Guest
Posts: n/a
 
      09-11-2006
Manlio Perillo wrote:
> Try with:
>
> msg = MIMEText(
> u'Some text with chars that need encoding: \xa3',
> _charset='utf-8',
> )
>
>
> and you will obtain the error:
>
> Traceback (most recent call last):
> File "<pyshell#4>", line 3, in -toplevel-
> _charset='utf-8',
> File "C:\Python2.4\lib\email\MIMEText.py", line 28, in __init__
> self.set_payload(_text, _charset)
> File "C:\Python2.4\lib\email\Message.py", line 218, in set_payload
> self.set_charset(charset)
> File "C:\Python2.4\lib\email\Message.py", line 260, in set_charset
> self._payload = charset.body_encode(self._payload)
> File "C:\Python2.4\lib\email\Charset.py", line 366, in body_encode
> return email.base64MIME.body_encode(s)
> File "C:\Python2.4\lib\email\base64MIME.py", line 136, in encode
> enc = b2a_base64(s[i:i + max_unencoded])
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in
> position 41: ordinal not in range(12


OK, but I fail to see how replacing one unicode error with another is
any help... :-S

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
 
Reply With Quote
 
Peter Otten
Guest
Posts: n/a
 
      09-11-2006
Chris Withers wrote:

> The following piece of code is giving me issues:
>
> from email.Charset import Charset,QP
> from email.MIMEText import MIMEText
> charset = Charset('utf-8')
> charset.body_encoding = QP
> msg = MIMEText(
> u'Some text with chars that need encoding: \xa3',
> 'plain',
> )
> msg.set_charset(charset)
> print msg.as_string()
>
> Under Python 2.4.2, this produces the following output, as I'd expect:
>
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> Content-Type: text/plain; charset="utf-8"
>
> Some text with chars that need encoding: =A3
>
> However, under Python 2.4.3, I now get:
>
> Traceback (most recent call last):
> File "test_encoding.py", line 14, in ?
> msg.as_string()
> File "c:\python24\lib\email\Message.py", line 129,
> in
> as_string
> g.flatten(self, unixfrom=unixfrom)
> File "c:\python24\lib\email\Generator.py", line 82,
> in flatten
> self._write(msg)
> File "c:\python24\lib\email\Generator.py", line 113,
> in _write
> self._dispatch(msg)
> File "c:\python24\lib\email\Generator.py", line 139,
> in
> _dispatch
> meth(msg)
> File "c:\python24\lib\email\Generator.py", line 182,
> in
> _handle_text
> self._fp.write(payload)
> UnicodeEncodeError: 'ascii' codec can't encode
> character
> u'\xa3' in position 41:
> ordinal not in range(12
>
> This seems to be as a result of this change:
>
>

http://svn.python.org/view/python/br...37910&r2=42272
>
> ...which is referred to as part of a fix for this bug:
>
>

http://sourceforge.net/tracker/?func...70&atid=105470
>
> Now, is this change to Generator.py in error or am I doing something
> wrong?


I'm not familiar enough with the email package to answer that.

> If the latter, how can I change my code such that it works as I'd expect?


email.Generator and email.Message use cStringIO.StringIO internally, which
can't cope with unicode. A quick fix might be to monkey-patch:

from StringIO import StringIO
from email import Generator, Message
Generator.StringIO = Message.StringIO = StringIO
# your code here

Peter
 
Reply With Quote
 
Chris Withers
Guest
Posts: n/a
 
      09-11-2006
Peter Otten wrote:
> http://sourceforge.net/tracker/?func...70&atid=105470
>> Now, is this change to Generator.py in error or am I doing something
>> wrong?

>
> I'm not familiar enough with the email package to answer that.


I'm hoping someone around here is

>> If the latter, how can I change my code such that it works as I'd expect?

>
> email.Generator and email.Message use cStringIO.StringIO internally, which
> can't cope with unicode. A quick fix might be to monkey-patch:


I'm not sure that's correct, but I'm happy to stand corrected.

My understanding is that the StringIO's don't mind as long as they type
is consistent - ie: con't mix unicode and encoded strings, 'cos it
forced python's default ascii codec to kick in and spew unicode errors.

Now, I want to know what I'm supposed to do when I have unicode source
and want it to end up as either a text/plain or text/html mime part.

Is there a how-to for this anywhere? The email package's docs are short
on examples involving charsets, unicode and the like

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
 
Reply With Quote
 
Steve Holden
Guest
Posts: n/a
 
      09-11-2006
Chris Withers wrote:
> Peter Otten wrote:
>
>>http://sourceforge.net/tracker/?func...70&atid=105470
>>
>>>Now, is this change to Generator.py in error or am I doing something
>>>wrong?

>>
>>I'm not familiar enough with the email package to answer that.

>
>
> I'm hoping someone around here is
>
>
>>>If the latter, how can I change my code such that it works as I'd expect?

>>
>>email.Generator and email.Message use cStringIO.StringIO internally, which
>>can't cope with unicode. A quick fix might be to monkey-patch:

>
>
> I'm not sure that's correct, but I'm happy to stand corrected.
>
> My understanding is that the StringIO's don't mind as long as they type
> is consistent - ie: con't mix unicode and encoded strings, 'cos it
> forced python's default ascii codec to kick in and spew unicode errors.
>
> Now, I want to know what I'm supposed to do when I have unicode source
> and want it to end up as either a text/plain or text/html mime part.
>
> Is there a how-to for this anywhere? The email package's docs are short
> on examples involving charsets, unicode and the like
>

Well, it would seem like the easiest approach is to monkey-patch the use
of cStringIO to StringIO as recommended and see if that fixes your
problem. Wouldn't it?

regards
Steve
--
Steve Holden +44 150 684 7255 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

 
Reply With Quote
 
Gerard Flanagan
Guest
Posts: n/a
 
      09-11-2006

Chris Withers wrote:
>
> Now, I want to know what I'm supposed to do when I have unicode source
> and want it to end up as either a text/plain or text/html mime part.
>
> Is there a how-to for this anywhere? The email package's docs are short
> on examples involving charsets, unicode and the like


no expert in this, but have you tried the codecs module?

http://docs.python.org/lib/codec-objects.html

( with 'xmlcharrefreplace' for the html )?

Gerard

 
Reply With Quote
 
Manlio Perillo
Guest
Posts: n/a
 
      09-11-2006
Chris Withers ha scritto:
> [...]
>
> OK, but I fail to see how replacing one unicode error with another is
> any help... :-S
>



The problem is simple: email package does not support well Unicode strings.

For now I'm using this:

charset = "utf-8" # the charset to be used for email


class HeadersMixin(object):
"""A custom mixin, for automatic internationalized headers
support.
"""

def __setitem__(self, name, val, **_params):
if isinstance(val, str):
try:
# only 7 bit ascii
val.decode("us-ascii")
except UnicodeDecodeError:
raise ValueError("8 bit strings not accepted")

return self.add_header(name, val)
else:
try:
# to avoid unnecessary trash
val = val.encode('us-ascii')
except:
val = Header.Header(val, charset).encode()

return self.add_header(name, val)


class MIMEText(HeadersMixin, _MIMEText.MIMEText):
"""A MIME Text message that allows only Unicode strings, or plain
ascii (7 bit) ones.
"""

def __init__(self, _text, _subtype="plain"):
_charset = charset

if isinstance(_text, str):
try:
# only 7 bit ascii
_text.decode("us-ascii")
_charset = "us-ascii"
except UnicodeDecodeError:
raise ValueError("8 bit strings not accepted")
else:
_text = _text.encode(charset)

return _MIMEText.MIMEText.__init__(self, _text, _subtype, _charset)


class MIMEMultipart(HeadersMixin, _MIMEMultipart.MIMEMultipart):
def __init__(self):
_MIMEMultipart.MIMEMultipart.__init__(self)



This only accepts Unicode strings or plain ascii strings.




Regards Manlio Perillo
 
Reply With Quote
 
Chris Withers
Guest
Posts: n/a
 
      09-11-2006
Steve Holden wrote:
>> Is there a how-to for this anywhere? The email package's docs are short
>> on examples involving charsets, unicode and the like
>>

> Well, it would seem like the easiest approach is to monkey-patch the use
> of cStringIO to StringIO as recommended and see if that fixes your
> problem. Wouldn't it?


No, not really, since at best that's a nasty (and I meant really nasty)
hack. I'm using the email package as part of a library that I'm building
which is to be used with various frameworks. Monkey patching modules is
about as bad as it gets in that situation...

At worst, and most likely based on my past experience of (c)StringIO
being used to accumulate output, it won't make a jot of difference...

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
 
Reply With Quote
 
Chris Withers
Guest
Posts: n/a
 
      09-11-2006
Manlio Perillo wrote:
>
> The problem is simple: email package does not support well Unicode strings.


Really? All the character set support seems to indicate a fair bit of
thought went into this aspect, although it does appear that no-one
bothered to document it

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Saving the web, charset problems and symbols problems Sak Na rede Ruby 0 01-30-2009 05:05 AM
Problems, problems for newbie Shelly ASP .Net 1 09-03-2007 02:10 AM
Problems compiling simple C++ code (also problems with std::string) Susan Baker C++ 2 06-26-2005 01:43 AM
Re: sound problems and modem problems Harold Potter Computer Support 5 12-04-2003 04:12 PM



Advertisments