Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > encoding hell - any chance of salvation ?

Reply
Thread Tools

encoding hell - any chance of salvation ?

 
 
southof40
Guest
Posts: n/a
 
      03-07-2011
Hi - I've got some code which uses array (http://docs.python.org/
library/array.html) to store charcters read from a file (it's not my
code it comes from here http://sourceforge.net/projects/pygold/)

The read is done, in GrammarReader.py, like this ...

def readString(self, maxsize = -1):
result = array('u')
char = None
while True:
if (maxsize >= 0) and (len(result) >= maxsize):
break
char = self.reader.read(2)
if (char == '') or (char == '\x00\x00'):
break
result.append(char)
return result.tounicode()

.... and results in the error"TypeError: array item must be unicode
character" is raised (full stack trace at bottom) .

The whole unicode thing is a bit strange because the input file is a
compiled grammar and so not a text file at all (the file able to be
downloaded from here http:///kubadev.com/share/VBScript.cgt)

Can anyone make a suggestion as to the best way to allow the array
object to accept what is in essence a binary file ?

Here's the full stack trace ...

>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pygold\Parser.py", line 100, in __init__
self.loadTables(filename)
File "pygold\Parser.py", line 365, in loadTables
reader = GrammarReader(filename, self.encoding)
File "pygold\GrammarReader.py", line 14, in __init__
if not self.hasValidHeader():
File "pygold\GrammarReader.py", line 43, in hasValidHeader
header = self.readString(64) ## read max 64 chars
File "pygold\GrammarReader.py", line 68, in readString
result.append(char)
TypeError: array item must be unicode character

 
Reply With Quote
 
 
 
 
Tom Zych
Guest
Posts: n/a
 
      03-07-2011
southof40 wrote:
> ...
> result = array('u')
> ...
> ... and results in the error"TypeError: array item must be unicode
> character" is raised (full stack trace at bottom) .
> ...
> Can anyone make a suggestion as to the best way to allow the array
> object to accept what is in essence a binary file ?


Glancing at the docs, it appears you want to use 'c', 'b', or 'B'
instead of 'u' when creating array.

--
Tom Zych / http://www.velocityreviews.com/forums/(E-Mail Removed)
"Would you like a lovely fluffy little white rabbit, little girl,
or a cutesy wootesly little brown rabbit?"
"Actually, I don't think my python would notice."
 
Reply With Quote
 
 
 
 
Terry Reedy
Guest
Posts: n/a
 
      03-07-2011
On 3/7/2011 6:24 AM, southof40 wrote:
> Hi - I've got some code which uses array (http://docs.python.org/
> library/array.html) to store charcters read from a file (it's not my
> code it comes from here http://sourceforge.net/projects/pygold/)
>
> The read is done, in GrammarReader.py, like this ...
>
> def readString(self, maxsize = -1):
> result = array('u')
> char = None
> while True:
> if (maxsize>= 0) and (len(result)>= maxsize):
> break
> char = self.reader.read(2)
> if (char == '') or (char == '\x00\x00'):
> break


print(type(char),char) # to see what is going on

> result.append(char)
> return result.tounicode()
>
> ... and results in the error"TypeError: array item must be unicode
> character" is raised (full stack trace at bottom) .
>
> The whole unicode thing is a bit strange because the input file is a
> compiled grammar and so not a text file at all (the file able to be
> downloaded from here http:///kubadev.com/share/VBScript.cgt)
>
> Can anyone make a suggestion as to the best way to allow the array
> object to accept what is in essence a binary file ?
>
> Here's the full stack trace ...
>
>>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')

> Traceback (most recent call last):
> File "<stdin>", line 1, in<module>
> File "pygold\Parser.py", line 100, in __init__
> self.loadTables(filename)
> File "pygold\Parser.py", line 365, in loadTables
> reader = GrammarReader(filename, self.encoding)
> File "pygold\GrammarReader.py", line 14, in __init__
> if not self.hasValidHeader():
> File "pygold\GrammarReader.py", line 43, in hasValidHeader
> header = self.readString(64) ## read max 64 chars
> File "pygold\GrammarReader.py", line 68, in readString
> result.append(char)
> TypeError: array item must be unicode character
>



--
Terry Jan Reedy

 
Reply With Quote
 
southof40
Guest
Posts: n/a
 
      03-08-2011
Thanks for both the suggestions. I haven't yet had time to try them
out but will do so and report back.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XML/encoding/prolog/python hell... fscked Python 8 04-14-2007 10:58 AM
501 PIX "deny any any" "allow any any" Any Anybody? Networking Student Cisco 4 11-16-2006 10:40 PM
Salvation for Opra!! David Kinsell Digital Photography 3 04-23-2006 06:17 PM
An invitation to salvation - this could change your life Roland Hell Grossiter Ruby 2 04-27-2005 10:15 AM
Matroska, Subtitles, Media Player Classic, Salvation DMM9999 DVD Video 0 07-14-2004 02:31 PM



Advertisments