Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > csv bugs

Reply
Thread Tools

csv bugs

 
 
Magnus Lie Hetland
Guest
Posts: n/a
 
      03-02-2004
It seems that when a line termination is escaped (using the current
escape character), csv.reader treats it as a line continuation, which
is well an good -- but it doesn't discard the escape character;
instead, it escapes it implicitly. This seems like a bug to me. E.g.

foo:bar:baz\
frozz:bozz

with separator ':' and escape character '\\' is parsed into

['foo', 'bar', 'baz\\\nfrozz', 'bozz']

In my opinion, it *ought* to be parsed into

['foo', 'bar', 'baz\nfrozz', 'bozz']

As far as I know, this is the UNIX convention, as used in (e.g.)
/etc/passwd.

Am I off target here? If the current behaviour is desirable (although
I can't see why it should be) then at least I think there should be a
way of implementing "normal" line continuations (as in my example),
which is the standard UNIX behavior, and the behavior of Python
source, for that matter. Otherwise, csv can't be used to parse (e.g.)
/etc/passwd...

And another thing: Perhaps a 'passwd' dialect could be added alongside
'excel'? Something like:

class passwd(Dialect):
delimiter = ':'
doublequote = False
escapechar = '\\'
lineterminator = '\n'
quotechar = '?'
quoting = QUOTE_NONE
skipinitialspace = False
register_dialect("passwd", passwd)

For some reason you *have* to supply a quotechar, even if you set
QUOTE_NONE... I guess that's a bug too, in my book.

If there are no objections, I might submit some of this as a bug
report or two (or even a patch).

--
Magnus Lie Hetland "The mind is not a vessel to be filled,
http://hetland.org but a fire to be lighted." [Plutarch]
 
Reply With Quote
 
 
 
 
Skip Montanaro
Guest
Posts: n/a
 
      03-02-2004

(A better place for this discussion would probably be .
I'm adding it to the cc list.)

Magnus> It seems that when a line termination is escaped (using the
Magnus> current escape character), csv.reader treats it as a line
Magnus> continuation, which is well an good -- but it doesn't discard
Magnus> the escape character; instead, it escapes it implicitly. This
Magnus> seems like a bug to me. E.g.

Magnus> foo:bar:baz\
Magnus> frozz:bozz

Magnus> with separator ':' and escape character '\\' is parsed into

Magnus> ['foo', 'bar', 'baz\\\nfrozz', 'bozz']

Magnus> In my opinion, it *ought* to be parsed into

Magnus> ['foo', 'bar', 'baz\nfrozz', 'bozz']

Magnus> As far as I know, this is the UNIX convention, as used in (e.g.)
Magnus> /etc/passwd.

That may be, however development of the csv module's parser was driven by
how Microsoft Excel behaves. The assumption was (rightly I think) that
Excel reads or writes more CSV files than anything else. I don't believe it
does anything with backslashes.

Magnus> Am I off target here? If the current behaviour is desirable
Magnus> (although I can't see why it should be) then at least I think
Magnus> there should be a way of implementing "normal" line
Magnus> continuations (as in my example), which is the standard UNIX
Magnus> behavior, and the behavior of Python source, for that
Magnus> matter. Otherwise, csv can't be used to parse (e.g.)
Magnus> /etc/passwd...

You're welcome to submit a patch. I don't have time for it.

Magnus> And another thing: Perhaps a 'passwd' dialect could be added
Magnus> alongside 'excel'? Something like:

Magnus> class passwd(Dialect):
Magnus> delimiter = ':'
Magnus> doublequote = False
Magnus> escapechar = '\\'
Magnus> lineterminator = '\n'
Magnus> quotechar = '?'
Magnus> quoting = QUOTE_NONE
Magnus> skipinitialspace = False
Magnus> register_dialect("passwd", passwd)

I'll take a look at that.

Magnus> For some reason you *have* to supply a quotechar, even if you
Magnus> set QUOTE_NONE... I guess that's a bug too, in my book.

Maybe. Maybe just a feature.

Magnus> If there are no objections, I might submit some of this as a bug
Magnus> report or two (or even a patch).

Please do.

Skip

 
Reply With Quote
 
 
 
 
Francis Avila
Guest
Posts: n/a
 
      03-03-2004
In <> Magnus Lie Hetland wrote:
> And another thing: Perhaps a 'passwd' dialect could be added alongside
> 'excel'? Something like:


I wanted this, and started to write it in Nov-2003, but because of bugs
in csv, outlined in

http://groups.google.com/groups?selm....supernews.com

it is not possible to implement a passwd dialect, at least as of Python
2.3.2. Unless I missed something obvious.

--
Francis Avila
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bugs and Bugs...get rid of them Jason Computer Security 1 01-31-2006 10:47 PM
Still use 'ruby-bugs' for Ruby bugs? Josef 'Jupp' Schugt Ruby 2 11-04-2004 10:10 PM
Re: csv writerow creates double spaced excel csv files Skip Montanaro Python 0 02-13-2004 08:50 PM
csv writerow creates double spaced excel csv files Michal Mikolajczyk Python 0 02-13-2004 08:38 PM
Python-2.3b1 bugs on Windows2000 with: the new csv module, stringreplace, and the re module Daniel Ortmann Python 4 07-02-2003 03:23 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57