Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Webpy and UnicodeDecodeError

Reply
Thread Tools

Webpy and UnicodeDecodeError

 
 
Oscar Del Ben
Guest
Posts: n/a
 
      12-18-2009
So I'm trying to send a file through webpy and urllib2 but I can't get
around these UnicodeErrors. Here's the code:

# controller

x = web.input(video_original={})
params = {'foo': x['foo']}

files = (('video[original]', 'test', x['video_original'].file.read
()),)
client.upload(upload_url, params, files, access_token())

# client library

def __encodeMultipart(self, fields, files):
"""
fields is a sequence of (name, value) elements for regular
form fields.
files is a sequence of (name, filename, value) elements for
data to be uploaded as files
Return (content_type, body) ready for httplib.HTTP instance
"""
boundary = mimetools.choose_boundary()
crlf = '\r\n'

l = []
for k, v in fields.iteritems():
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s"' % k)
l.append('')
l.append(v)
for (k, f, v) in files:
l.append('--' + boundary)
l.append('Content-Disposition: form-data; name="%s";
filename="%s"' % (k, f))
l.append('Content-Type: %s' % self.__getContentType(f))
l.append('')
l.append(v)
l.append('--' + boundary + '--')
l.append('')
body = crlf.join(l)

return boundary, body

def __getContentType(self, filename):
return mimetypes.guess_type(filename)[0] or 'application/octet-
stream'

def upload(self, path, post_params, files, token=None):

if token:
token = oauth.OAuthToken.from_string(token)

url = "http://%s%s" % (self.authority, path)

(boundary, body) = self.__encodeMultipart(post_params, files)

headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
boundary,
'Content-Length': str(len(body))
}

request = oauth.OAuthRequest.from_consumer_and_token(
self.consumer,
token,
http_method='POST',
http_url=url,
parameters=post_params
)

request.sign_request(oauth.OAuthSignatureMethod_HM AC_SHA1(),
self.consumer, token)

request = urllib2.Request(request.http_url, postdata=body,
headers=headers)
request.get_method = lambda: 'POST'

return urllib2.urlopen(request)

Unfortunately I get two kinds of unicode error, the first one in the
crlf.join(l):

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(12


And here's another one:

Traceback (most recent call last):
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 242, in process
return self.handle()
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 233, in handle
return self._delegate(fn, self.fvars, args)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 412, in _delegate
return handle_class(cls)
File "/Users/oscar/projects/work/whitelabel/web/application.py",
line 387, in handle_class
return tocall(*args)
File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
POST
return simplejson.load(client.upload(upload_url, params, files,
access_token()))
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
131, in upload
(boundary, body) = self.__encodeMultipart(post_params, files)
File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
111, in __encodeMultipart
body = crlf.join(l)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
42: ordinal not in range(12

Does anyone know why this errors happens and what I should do to
prevent them? Many thanks.

Oscar
 
Reply With Quote
 
 
 
 
Dave Angel
Guest
Posts: n/a
 
      12-18-2009
Oscar Del Ben wrote:
> So I'm trying to send a file through webpy and urllib2 but I can't get
> around these UnicodeErrors. Here's the code:
>
> # controller
>
> x = web.input(video_original={})
> params = {'foo': x['foo']}
>
> files = (('video[original]', 'test', x['video_original'].file.read
> ()),)
> client.upload(upload_url, params, files, access_token())
>
> # client library
>
> def __encodeMultipart(self, fields, files):
> """
> fields is a sequence of (name, value) elements for regular
> form fields.
> files is a sequence of (name, filename, value) elements for
> data to be uploaded as files
> Return (content_type, body) ready for httplib.HTTP instance
> """
> boundary = mimetools.choose_boundary()
> crlf = '\r\n'
>
> l = []
> for k, v in fields.iteritems():
> l.append('--' + boundary)
> l.append('Content-Disposition: form-data; name="%s"' % k)
> l.append('')
> l.append(v)
> for (k, f, v) in files:
> l.append('--' + boundary)
> l.append('Content-Disposition: form-data; name="%s";
> filename="%s"' % (k, f))
> l.append('Content-Type: %s' % self.__getContentType(f))
> l.append('')
> l.append(v)
> l.append('--' + boundary + '--')
> l.append('')
> body = crlf.join(l)
>
> return boundary, body
>
> def __getContentType(self, filename):
> return mimetypes.guess_type(filename)[0] or 'application/octet-
> stream'
>
> def upload(self, path, post_params, files, token=None):
>
> if token:
> token = oauth.OAuthToken.from_string(token)
>
> url = "http://%s%s" % (self.authority, path)
>
> (boundary, body) = self.__encodeMultipart(post_params, files)
>
> headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
> boundary,
> 'Content-Length': str(len(body))
> }
>
> request = oauth.OAuthRequest.from_consumer_and_token(
> self.consumer,
> token,
> http_method='POST',
> http_url=url,
> parameters=post_params
> )
>
> request.sign_request(oauth.OAuthSignatureMethod_HM AC_SHA1(),
> self.consumer, token)
>
> request = urllib2.Request(request.http_url, postdata=body,
> headers=headers)
> request.get_method = lambda: 'POST'
>
> return urllib2.urlopen(request)
>
> Unfortunately I get two kinds of unicode error, the first one in the
> crlf.join(l):
>
> Traceback (most recent call last):
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 242, in process
> return self.handle()
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 233, in handle
> return self._delegate(fn, self.fvars, args)
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 412, in _delegate
> return handle_class(cls)
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 387, in handle_class
> return tocall(*args)
> File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
> POST
> return simplejson.load(client.upload(upload_url, params, files,
> access_token()))
> File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> 131, in upload
> (boundary, body) = self.__encodeMultipart(post_params, files)
> File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> 111, in __encodeMultipart
> body = crlf.join(l)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
> 42: ordinal not in range(12
>
>
> And here's another one:
>
> Traceback (most recent call last):
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 242, in process
> return self.handle()
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 233, in handle
> return self._delegate(fn, self.fvars, args)
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 412, in _delegate
> return handle_class(cls)
> File "/Users/oscar/projects/work/whitelabel/web/application.py",
> line 387, in handle_class
> return tocall(*args)
> File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
> POST
> return simplejson.load(client.upload(upload_url, params, files,
> access_token()))
> File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> 131, in upload
> (boundary, body) = self.__encodeMultipart(post_params, files)
> File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> 111, in __encodeMultipart
> body = crlf.join(l)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
> 42: ordinal not in range(12
>
> Does anyone know why this errors happens and what I should do to
> prevent them? Many thanks.
>
> Oscar
>
>

I did a short test to demonstrate the likely problem, without all the
other libraries and complexity.

lst = ["abc"]
lst.append("def")
lst.append(u"abc")
lst.append("g\x48\x82\x94i")
print lst
print "**".join(lst)


That fragment of code generates (in Python 2.6) the following output and
traceback:

['abc', 'def', u'abc', 'gH\x82\x94i']
Traceback (most recent call last):
File "M:\Programming\Python\sources\dummy\stuff2.py ", line 10, in <module>
print "**".join(lst)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
ordinal not in range(12


You'll notice that one of the strings is a unicode one, and another one
has the character 0x82 in it. Once join() discovers Unicode, it needs
to produce a Unicode string, and by default, it uses the ASCII codec to
get it.

If you print your 'l' list (bad name, by the way, looks too much like a
'1'), you can see which element is Unicode, and which one has the \xb7
in position 42. You'll have to decide which is the problem, and solve
it accordingly. Was the fact that one of the strings is unicode an
oversight? Or did you think that all characters would be 0x7f or less?
Or do you want to handle all possible characters, and if so, with what
encoding?

DaveA


 
Reply With Quote
 
 
 
 
Oscar Del Ben
Guest
Posts: n/a
 
      12-18-2009
On Dec 18, 4:43*pm, Dave Angel <da...@ieee.org> wrote:
> Oscar Del Ben wrote:
> > So I'm trying to send a file through webpy and urllib2 but I can't get
> > around these UnicodeErrors. Here's the code:

>
> > # controller

>
> > x = web.input(video_original={})
> > params = {'foo': x['foo']}

>
> > files = (('video[original]', 'test', x['video_original'].file.read
> > ()),)
> > client.upload(upload_url, params, files, access_token())

>
> > # client library

>
> > def __encodeMultipart(self, fields, files):
> > * * * * """
> > * * * * fields is a sequence of (name, value) elements for regular
> > form fields.
> > * * * * files is a sequence of (name, filename, value) elements for
> > data to be uploaded as files
> > * * * * Return (content_type, body) ready for httplib.HTTP instance
> > * * * * """
> > * * * * boundary = mimetools.choose_boundary()
> > * * * * crlf = '\r\n'

>
> > * * * * l = []
> > * * * * for k, v in fields.iteritems():
> > * * * * * * l.append('--' + boundary)
> > * * * * * * l.append('Content-Disposition: form-data; name="%s"' % k)
> > * * * * * * l.append('')
> > * * * * * * l.append(v)
> > * * * * for (k, f, v) in files:
> > * * * * * * l.append('--' + boundary)
> > * * * * * * l.append('Content-Disposition: form-data; name="%s";
> > filename="%s"' % (k, f))
> > * * * * * * l.append('Content-Type: %s' % self.__getContentType(f))
> > * * * * * * l.append('')
> > * * * * * * l.append(v)
> > * * * * l.append('--' + boundary + '--')
> > * * * * l.append('')
> > * * * * body = crlf.join(l)

>
> > * * * * return boundary, body

>
> > * * def __getContentType(self, filename):
> > * * * * return mimetypes.guess_type(filename)[0] or 'application/octet-
> > stream'

>
> > * * def upload(self, path, post_params, files, token=None):

>
> > * * * if token:
> > * * * * token = oauth.OAuthToken.from_string(token)

>
> > * * * url = "http://%s%s" % (self.authority, path)

>
> > * * * (boundary, body) = self.__encodeMultipart(post_params, files)

>
> > * * * headers = {'Content-Type': 'multipart/form-data; boundary=%s' %
> > boundary,
> > * * * * * 'Content-Length': str(len(body))
> > * * * * * }

>
> > * * * request = oauth.OAuthRequest.from_consumer_and_token(
> > * * * * self.consumer,
> > * * * * token,
> > * * * * http_method='POST',
> > * * * * http_url=url,
> > * * * * parameters=post_params
> > * * * )

>
> > * * * request.sign_request(oauth.OAuthSignatureMethod_HM AC_SHA1(),
> > self.consumer, token)

>
> > * * * request = urllib2.Request(request.http_url, postdata=body,
> > headers=headers)
> > * * * request.get_method = lambda: 'POST'

>
> > * * * return urllib2.urlopen(request)

>
> > Unfortunately I get two kinds of unicode error, the first one in the
> > crlf.join(l):

>
> > Traceback (most recent call last):
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 242, in process
> > * * return self.handle()
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 233, in handle
> > * * return self._delegate(fn, self.fvars, args)
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 412, in _delegate
> > * * return handle_class(cls)
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 387, in handle_class
> > * * return tocall(*args)
> > * File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
> > POST
> > * * return simplejson.load(client.upload(upload_url, params, files,
> > access_token()))
> > * File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> > 131, in upload
> > * * (boundary, body) = self.__encodeMultipart(post_params, files)
> > * File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> > 111, in __encodeMultipart
> > * * body = crlf.join(l)
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
> > 42: ordinal not in range(12

>
> > And here's another one:

>
> > Traceback (most recent call last):
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 242, in process
> > * * return self.handle()
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 233, in handle
> > * * return self._delegate(fn, self.fvars, args)
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 412, in _delegate
> > * * return handle_class(cls)
> > * File "/Users/oscar/projects/work/whitelabel/web/application.py",
> > line 387, in handle_class
> > * * return tocall(*args)
> > * File "/Users/oscar/projects/work/whitelabel/code.py", line 328, in
> > POST
> > * * return simplejson.load(client.upload(upload_url, params, files,
> > access_token()))
> > * File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> > 131, in upload
> > * * (boundary, body) = self.__encodeMultipart(post_params, files)
> > * File "/Users/oscar/projects/work/whitelabel/oauth_client.py", line
> > 111, in __encodeMultipart
> > * * body = crlf.join(l)
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0xb7 in position
> > 42: ordinal not in range(12

>
> > Does anyone know why this errors happens and what I should do to
> > prevent them? Many thanks.

>
> > Oscar

>
> I did a short test to demonstrate the likely problem, without all the
> other libraries and complexity.
>
> lst = ["abc"]
> lst.append("def")
> lst.append(u"abc")
> lst.append("g\x48\x82\x94i")
> print lst
> print "**".join(lst)
>
> That fragment of code generates (in Python 2.6) the following output and
> traceback:
>
> ['abc', 'def', u'abc', 'gH\x82\x94i']
> Traceback (most recent call last):
> * File "M:\Programming\Python\sources\dummy\stuff2.py ", line 10, in <module>
> * * print "**".join(lst)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 2:
> ordinal not in range(12
>
> You'll notice that one of the strings is a unicode one, and another one
> has the character 0x82 in it. *Once join() discovers Unicode, it needs
> to produce a Unicode string, and by default, it uses the ASCII codec to
> get it.
>
> If you print your 'l' list (bad name, by the way, looks too much like a
> '1'), you can see which element is Unicode, and which one has the \xb7
> in position 42. *You'll have to decide which is the problem, and solve
> it accordingly. *Was the fact that one of the strings is unicode an
> oversight? *Or did you think that all characters would be 0x7f or less? *
> Or do you want to handle all possible characters, and if so, with what
> encoding?
>
> DaveA


Thanks for your reply DaveA.

Since I'm dealing with file uploads, I guess I should only care about
those. I understand the fact that I'm trying to concatenate a unicode
string with a binary, but I don't know how to deal with this. Perhaps
the uploaded file should be encoded in some way? I don't think this is
the case though.
 
Reply With Quote
 
Dave Angel
Guest
Posts: n/a
 
      12-19-2009
Oscar Del Ben wrote:
> <snip>
>> You'll notice that one of the strings is a unicode one, and another one
>> has the character 0x82 in it. Once join() discovers Unicode, it needs
>> to produce a Unicode string, and by default, it uses the ASCII codec to
>> get it.
>>
>> If you print your 'l' list (bad name, by the way, looks too much like a
>> '1'), you can see which element is Unicode, and which one has the \xb7
>> in position 42. You'll have to decide which is the problem, and solve
>> it accordingly. Was the fact that one of the strings is unicode an
>> oversight? Or did you think that all characters would be 0x7f or less?
>> Or do you want to handle all possible characters, and if so, with what
>> encoding?
>>
>> DaveA
>>

>
> Thanks for your reply DaveA.
>
> Since I'm dealing with file uploads, I guess I should only care about
> those. I understand the fact that I'm trying to concatenate a unicode
> string with a binary, but I don't know how to deal with this. Perhaps
> the uploaded file should be encoded in some way? I don't think this is
> the case though.
>
>

You have to decide what the format of the file is to be. If you have
some in bytes, and some in Unicode, you have to be explicit about how
you merge them. And that depends who's going to use the file, and for
what purpose.

Before you try to do a join(), you have to do a conversion of the
Unicode string(s) to bytes. Try str.encode(), where you get to specify
what encoding to use.

In general, you want to use the same encoding for all the bytes in a
given file. But as I just said, that's entirely up to you.

DaveA

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
UnicodeDecodeError? Argh! Nothing works! I'm tired and hurting and... Alf P. Steinbach Python 18 12-05-2009 04:20 PM
Webpy vs Django? circularfunc@yahoo.se Python 4 06-03-2008 09:11 PM
psyco+webpy a Python 3 06-26-2006 09:35 PM
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 10: ordinal not in range(128) Robin Siebler Python 4 10-08-2004 08:03 PM
minidom's setAttribute + UnicodeDecodeError Ruslan Python 1 09-07-2004 08:33 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57