Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: help I'm getting delimited

Reply
Thread Tools

Re: help I'm getting delimited

 
 
aka
Guest
Posts: n/a
 
      12-17-2008
John, this is the actual code I ran in TurboGears which is a Python
framework.
I should have left away the import statements. Trust me, the problem
isn't in there because the UnicodeWriter is functioning perfectly.
I did allready sanitate the csv file to these four lines in Notepad so
there isn't anything more than this:

id;company;department
12;Cadillac;Research
11;Ford;Accounting
10;Chrysler;Sales

The only possible problematic lines are marked ##### here:

> > * * def import_roles(self, input=None, *args, **kwargs):
> > * * * * inp = 'C:/temp/test.csv'
> > * * * * roles = []
> > * * * * msg = ''
> > * * * * ## try:
> > * * * * fp = open(inp, 'rb') #####
> > * * * * reader = csv.reader(fp, dialect='excel', delimiter=';') #####
> > * * * * ## reader = UnicodeReader(fp, dialect='excel', delimiter=';') #####
> > * * * * for r in reader:
> > * * * * * * roles.append(r[0]) #####
> > * * * * fp.close()
> > * * * * ## except:
> > * * * * * * ## msg = "Something's wrong with the csv.reader"
> > * * * * return dict(filepath=inp,
> > * * * * * * * * * * roles=str(roles),
> > * * * * * * * * * * msg=msg)


Yeah rdmur, I'll have a look at the Python commandline.
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      12-17-2008
On Dec 18, 3:15*am, aka <(E-Mail Removed)> wrote:
> John, this is the actual code I ran in TurboGears which is a Python
> framework.


It's not complete -- the change in indentation would have caused a
SyntaxError.

If (as you appear to assert) the problem is in the csv module, then
create a small stand-alone no-TurboGears Python script and a test file
which together demonstrate the problem reproducibly so that the
problem can investigated by anyone with a standard TurboGears-free
Python installation.

If you can't reproduce the problem in that manner, then you may need
to seek assistance in a TurboGears-specific forum.

> I should have left away the import statements. Trust me, the problem
> isn't in there because the UnicodeWriter is functioning perfectly.


Do you mean that this file was created by whatever.UnicodeWriter? If
so, did you just now discover this information?

How do you know that "the UnicodeWriter is functioning perfectly"?
What does "functioning perfectly mean to you"? In particular, what
encoding is it using?

> I did allready sanitate the csv file to these four lines in Notepad so
> there isn't anything more than this:
>
> id;company;department
> 12;Cadillac;Research
> 11;Ford;Accounting
> 10;Chrysler;Sales


Which do you mean:
(a) you typed those lines into Notepad yourself
(b) you took a copy of a file created by whatever.UnicodeWriter,
opened it with Notepad, trimmed off some rows and columns, and saved
it again
?

You said earlier
"""
csv.reader results in: for r in reader: Error: line contains NULL
byte

Use of UnicodeReader results in: UnicodeDecodeError: 'utf8' codec
can't decode byte 0xff in position 0: unexpected code byte
"""

Those results are consistent with your file being encoded in utf16_le,
with the utf16_le BOM ('\xff\xfe') at the start of the file.

Have you, as I asked, looked at the file with some better-than-Notepad
diagnostic apparatus?

Here's a likely hypothesis: the file was written in utf16. In that
case:
either (i) you really want utf16 (why?), so:

(1) the csv module will not cope with it, and is not expected to cope
with it

(2) the whatever.UnicodeReader should (in order of preference):
(a) be allowed to find out for itself that 'utf16' is the go
(b) be told explicitly that 'utf16' is the go
(c) be served with a bug report

OR (ii) you really want utf8, so:

(1) the csv module should be happy
(2) the whatever.UnicodeWriter should be told to use 'utf8'
(3) the whatever.UnicodeReader should (in order of preference):
[as above but s/16/8/]

HTH,
John
 
Reply With Quote
 
 
 
 
aka
Guest
Posts: n/a
 
      12-18-2008
On 18 dec, 00:06, John Machin <(E-Mail Removed)> wrote:
> On Dec 18, 3:15*am, aka <(E-Mail Removed)> wrote:
>
> Do you mean that this file was created by whatever.UnicodeWriter? If
> so, did you just now discover this information?
>
> How do you know that "the UnicodeWriter is functioning perfectly"?
> What does "functioning perfectly mean to you"? In particular, what
> encoding is it using?
>
> Which do you mean:
> (a) you typed those lines into Notepad yourself
> (b) you took a copy of a file created by whatever.UnicodeWriter,
> opened it with Notepad, trimmed off some rows and columns, and saved
> it again
> ?
> Here's a likely hypothesis: the file was written in utf16. In that
> case:
> either (i) you really want utf16 (why?), so:
>
> (1) the csv module will not cope with it, and is not expected to cope
> with it
>
> (2) the whatever.UnicodeReader should (in order of preference):
> (a) be allowed to find out for itself that 'utf16' is the go
> (b) be told explicitly that 'utf16' is the go
> (c) be served with a bug report
>
> OR (ii) you really want utf8, so:
>
> (1) the csv module should be happy
> (2) the whatever.UnicodeWriter should be told to use 'utf8'
> (3) the whatever.UnicodeReader should (in order of preference):
> [as above but s/16/8/]
>

The csv file originally was created by the UnicodeWriter class and was
used for a mailmerge function with Microsoft Word which all functioned
perfectly.
The reverse did not: read back the outputted file so at last I editted
it in Notepad, cutting off columns, but I didn't know that the
encoding would remain even after that because it still caused
problems.
Now after testing from the Python command line with a csv file
generated from Excel I could get it working so it had to be the
encoding.
Because the write side of my code, which uses the UnicodeWriter, was
ok I didn't pay attention to the fact that I had changed the UW class
from UTF-8 to UTF-16 because of difficulties with dutch characters
like and .
Then at last I tried changing back to UTF-8 and noticed both out -and
input was working, including those special characters, so it was my
unjustifiable conclusion that I couldn't get around these special
characters at the write side without UTF-16 which ultimately got me in
trouble with the read side.
With your help I got it straight. Once again minimizing the problem to
its bare basics and to prevent big steps is the key.
Thanks a lot for your help John.
BTW, the TurboGears code by the way is not very different from Python,
it just uses some extra identifiers.
 
Reply With Quote
 
aka
Guest
Posts: n/a
 
      12-18-2008
On 18 dec, 00:06, John Machin <(E-Mail Removed)> wrote:


- Tekst uit oorspronkelijk bericht niet weergeven -
- Tekst uit oorspronkelijk bericht weergeven -

> On Dec 18, 3:15 am, aka <(E-Mail Removed)> wrote:


> Do you mean that this file was created by whatever.UnicodeWriter? If
> so, did you just now discover this information?



> How do you know that "the UnicodeWriter is functioning perfectly"?
> What does "functioning perfectly mean to you"? In particular, what
> encoding is it using?



> Which do you mean:
> (a) you typed those lines into Notepad yourself
> (b) you took a copy of a file created by whatever.UnicodeWriter,
> opened it with Notepad, trimmed off some rows and columns, and saved
> it again
> ?
> Here's a likely hypothesis: the file was written in utf16. In that
> case:
> either (i) you really want utf16 (why?), so:



> (1) the csv module will not cope with it, and is not expected to cope
> with it



> (2) the whatever.UnicodeReader should (in order of preference):
> (a) be allowed to find out for itself that 'utf16' is the go
> (b) be told explicitly that 'utf16' is the go
> (c) be served with a bug report



> OR (ii) you really want utf8, so:



> (1) the csv module should be happy
> (2) the whatever.UnicodeWriter should be told to use 'utf8'
> (3) the whatever.UnicodeReader should (in order of preference):
> [as above but s/16/8/]




The csv file originally was created by the UnicodeWriter class and
was
used for a mailmerge function with Microsoft Word which all
functioned
perfectly.
The reverse did not: read back the outputted file so at last I
editted
it in Notepad, cutting off columns, but I didn't know that the
encoding would remain even after that because it still caused
problems.
Now after testing from the Python command line with a csv file
generated from Excel I could get it working so it had to be the
encoding.
Because the write side of my code, which uses the UnicodeWriter, was
ok I didn't pay attention to the fact that I had changed the UW class
from UTF-8 to UTF-16 because of difficulties with dutch characters
like and .
Then at last I tried changing back to UTF-8 and noticed both out -and
input was working, including those special characters, so it was my
unjustifiable conclusion that I couldn't get around these special
characters at the write side without UTF-16 which ultimately got me
in
trouble with the read side.
With your help I got it straight. Once again minimizing the problem
to
its bare basics and to prevent big steps is the key.
Thanks a lot for your help John.
BTW, the TurboGears code is not very different from Python,
it just uses some extra identifiers around the Python code.
 
Reply With Quote
 
aka
Guest
Posts: n/a
 
      12-18-2008
> On Dec 18, 3:15 am, aka <(E-Mail Removed)> wrote:
> Do you mean that this file was created by whatever.UnicodeWriter? If
> so, did you just now discover this information?
> How do you know that "the UnicodeWriter is functioning perfectly"?
> What does "functioning perfectly mean to you"? In particular, what
> encoding is it using?
> Which do you mean:
> (a) you typed those lines into Notepad yourself
> (b) you took a copy of a file created by whatever.UnicodeWriter,
> opened it with Notepad, trimmed off some rows and columns, and saved
> it again
> ?
> Here's a likely hypothesis: the file was written in utf16. In that
> case:
> either (i) you really want utf16 (why?), so:
> (1) the csv module will not cope with it, and is not expected to cope
> with it
> (2) the whatever.UnicodeReader should (in order of preference):
> (a) be allowed to find out for itself that 'utf16' is the go
> (b) be told explicitly that 'utf16' is the go
> (c) be served with a bug report
> OR (ii) you really want utf8, so:
> (1) the csv module should be happy
> (2) the whatever.UnicodeWriter should be told to use 'utf8'
> (3) the whatever.UnicodeReader should (in order of preference):
> [as above but s/16/8/]



The csv file originally was created by the UnicodeWriter class and
was
used for a mailmerge function with Microsoft Word which all
functioned
perfectly.
The reverse did not: read back the outputted file so at last I
editted
it in Notepad, cutting off columns, but I didn't know that the
encoding would remain even after that because it still caused
problems.
Now after testing from the Python command line with a csv file
generated from Excel I could get it working so it had to be the
encoding.
Because the write side of my code, which uses the UnicodeWriter, was
ok I didn't pay attention to the fact that I had changed the UW class
from UTF-8 to UTF-16 because of difficulties with dutch characters
like and .
Then at last I tried changing back to UTF-8 and noticed both out -and
input was working, including those special characters, so it was my
unjustifiable conclusion that I couldn't get around these special
characters at the write side without UTF-16 which ultimately got me
in trouble with the read side.
With your help I got it straight. Once again minimizing the problem
to its bare basics and preventing too large steps is the key.
Thanks a lot for your help John.
BTW, the TurboGears code is not very different from Python,
it just uses some extra identifiers.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: Help "joining" two files delimited with pipe character ("|") Miki Tebeka Python 0 12-05-2012 06:17 PM
Need help with my delimited stream wrapper class rep_movsd C++ 2 08-17-2011 07:16 PM
help I'm getting delimited aka Python 0 12-16-2008 04:26 PM
convert non-delimited to delimited RyanL Python 6 08-28-2007 12:06 AM



Advertisments