![]() |
encoding hell - any chance of salvation ?
Hi - I've got some code which uses array (http://docs.python.org/
library/array.html) to store charcters read from a file (it's not my code it comes from here http://sourceforge.net/projects/pygold/) The read is done, in GrammarReader.py, like this ... def readString(self, maxsize = -1): result = array('u') char = None while True: if (maxsize >= 0) and (len(result) >= maxsize): break char = self.reader.read(2) if (char == '') or (char == '\x00\x00'): break result.append(char) return result.tounicode() .... and results in the error"TypeError: array item must be unicode character" is raised (full stack trace at bottom) . The whole unicode thing is a bit strange because the input file is a compiled grammar and so not a text file at all (the file able to be downloaded from here http:///kubadev.com/share/VBScript.cgt) Can anyone make a suggestion as to the best way to allow the array object to accept what is in essence a binary file ? Here's the full stack trace ... >>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pygold\Parser.py", line 100, in __init__ self.loadTables(filename) File "pygold\Parser.py", line 365, in loadTables reader = GrammarReader(filename, self.encoding) File "pygold\GrammarReader.py", line 14, in __init__ if not self.hasValidHeader(): File "pygold\GrammarReader.py", line 43, in hasValidHeader header = self.readString(64) ## read max 64 chars File "pygold\GrammarReader.py", line 68, in readString result.append(char) TypeError: array item must be unicode character |
Re: encoding hell - any chance of salvation ?
southof40 wrote:
> ... > result = array('u') > ... > ... and results in the error"TypeError: array item must be unicode > character" is raised (full stack trace at bottom) . > ... > Can anyone make a suggestion as to the best way to allow the array > object to accept what is in essence a binary file ? Glancing at the docs, it appears you want to use 'c', 'b', or 'B' instead of 'u' when creating array. -- Tom Zych / freethinker@pobox.com "Would you like a lovely fluffy little white rabbit, little girl, or a cutesy wootesly little brown rabbit?" "Actually, I don't think my python would notice." |
Re: encoding hell - any chance of salvation ?
On 3/7/2011 6:24 AM, southof40 wrote:
> Hi - I've got some code which uses array (http://docs.python.org/ > library/array.html) to store charcters read from a file (it's not my > code it comes from here http://sourceforge.net/projects/pygold/) > > The read is done, in GrammarReader.py, like this ... > > def readString(self, maxsize = -1): > result = array('u') > char = None > while True: > if (maxsize>= 0) and (len(result)>= maxsize): > break > char = self.reader.read(2) > if (char == '') or (char == '\x00\x00'): > break print(type(char),char) # to see what is going on > result.append(char) > return result.tounicode() > > ... and results in the error"TypeError: array item must be unicode > character" is raised (full stack trace at bottom) . > > The whole unicode thing is a bit strange because the input file is a > compiled grammar and so not a text file at all (the file able to be > downloaded from here http:///kubadev.com/share/VBScript.cgt) > > Can anyone make a suggestion as to the best way to allow the array > object to accept what is in essence a binary file ? > > Here's the full stack trace ... > >>>> p=pygold.Parser('C:/data/Gold-Parser-VBScript-Grammar/VBScript-Test0-UTF8.cgt','utf-8') > Traceback (most recent call last): > File "<stdin>", line 1, in<module> > File "pygold\Parser.py", line 100, in __init__ > self.loadTables(filename) > File "pygold\Parser.py", line 365, in loadTables > reader = GrammarReader(filename, self.encoding) > File "pygold\GrammarReader.py", line 14, in __init__ > if not self.hasValidHeader(): > File "pygold\GrammarReader.py", line 43, in hasValidHeader > header = self.readString(64) ## read max 64 chars > File "pygold\GrammarReader.py", line 68, in readString > result.append(char) > TypeError: array item must be unicode character > -- Terry Jan Reedy |
Re: encoding hell - any chance of salvation ?
Thanks for both the suggestions. I haven't yet had time to try them
out but will do so and report back. |
| All times are GMT. The time now is 02:40 PM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.