Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   Python (http://www.velocityreviews.com/forums/f43-python.html)
-   -   encoding hell - any chance of salvation ? (http://www.velocityreviews.com/forums/t744718-encoding-hell-any-chance-of-salvation.html)

southof40 03-07-2011 11:24 AM

encoding hell - any chance of salvation ?
 
Hi - I've got some code which uses array (http://docs.python.org/
library/array.html) to store charcters read from a file (it's not my
code it comes from here http://sourceforge.net/projects/pygold/)

The read is done, in GrammarReader.py, like this ...

def readString(self, maxsize = -1):
result = array('u')
char = None
while True:
if (maxsize >= 0) and (len(result) >= maxsize):
break
char = self.reader.read(2)
if (char == '') or (char == '\x00\x00'):
break
result.append(char)
return result.tounicode()

.... and results in the error"TypeError: array item must be unicode
character" is raised (full stack trace at bottom) .

The whole unicode thing is a bit strange because the input file is a
compiled grammar and so not a text file at all (the file able to be
downloaded from here http:///kubadev.com/share/VBScript.cgt)

Can anyone make a suggestion as to the best way to allow the array
object to accept what is in essence a binary file ?

Here's the full stack trace ...

>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pygold\Parser.py", line 100, in __init__
self.loadTables(filename)
File "pygold\Parser.py", line 365, in loadTables
reader = GrammarReader(filename, self.encoding)
File "pygold\GrammarReader.py", line 14, in __init__
if not self.hasValidHeader():
File "pygold\GrammarReader.py", line 43, in hasValidHeader
header = self.readString(64) ## read max 64 chars
File "pygold\GrammarReader.py", line 68, in readString
result.append(char)
TypeError: array item must be unicode character


Tom Zych 03-07-2011 11:38 AM

Re: encoding hell - any chance of salvation ?
 
southof40 wrote:
> ...
> result = array('u')
> ...
> ... and results in the error"TypeError: array item must be unicode
> character" is raised (full stack trace at bottom) .
> ...
> Can anyone make a suggestion as to the best way to allow the array
> object to accept what is in essence a binary file ?


Glancing at the docs, it appears you want to use 'c', 'b', or 'B'
instead of 'u' when creating array.

--
Tom Zych / freethinker@pobox.com
"Would you like a lovely fluffy little white rabbit, little girl,
or a cutesy wootesly little brown rabbit?"
"Actually, I don't think my python would notice."

Terry Reedy 03-07-2011 08:12 PM

Re: encoding hell - any chance of salvation ?
 
On 3/7/2011 6:24 AM, southof40 wrote:
> Hi - I've got some code which uses array (http://docs.python.org/
> library/array.html) to store charcters read from a file (it's not my
> code it comes from here http://sourceforge.net/projects/pygold/)
>
> The read is done, in GrammarReader.py, like this ...
>
> def readString(self, maxsize = -1):
> result = array('u')
> char = None
> while True:
> if (maxsize>= 0) and (len(result)>= maxsize):
> break
> char = self.reader.read(2)
> if (char == '') or (char == '\x00\x00'):
> break


print(type(char),char) # to see what is going on

> result.append(char)
> return result.tounicode()
>
> ... and results in the error"TypeError: array item must be unicode
> character" is raised (full stack trace at bottom) .
>
> The whole unicode thing is a bit strange because the input file is a
> compiled grammar and so not a text file at all (the file able to be
> downloaded from here http:///kubadev.com/share/VBScript.cgt)
>
> Can anyone make a suggestion as to the best way to allow the array
> object to accept what is in essence a binary file ?
>
> Here's the full stack trace ...
>
>>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8')

> Traceback (most recent call last):
> File "<stdin>", line 1, in<module>
> File "pygold\Parser.py", line 100, in __init__
> self.loadTables(filename)
> File "pygold\Parser.py", line 365, in loadTables
> reader = GrammarReader(filename, self.encoding)
> File "pygold\GrammarReader.py", line 14, in __init__
> if not self.hasValidHeader():
> File "pygold\GrammarReader.py", line 43, in hasValidHeader
> header = self.readString(64) ## read max 64 chars
> File "pygold\GrammarReader.py", line 68, in readString
> result.append(char)
> TypeError: array item must be unicode character
>



--
Terry Jan Reedy


southof40 03-08-2011 09:57 AM

Re: encoding hell - any chance of salvation ?
 
Thanks for both the suggestions. I haven't yet had time to try them
out but will do so and report back.



All times are GMT. The time now is 02:40 PM.

Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57