![]() |
csv bugs
It seems that when a line termination is escaped (using the current
escape character), csv.reader treats it as a line continuation, which is well an good -- but it doesn't discard the escape character; instead, it escapes it implicitly. This seems like a bug to me. E.g. foo:bar:baz\ frozz:bozz with separator ':' and escape character '\\' is parsed into ['foo', 'bar', 'baz\\\nfrozz', 'bozz'] In my opinion, it *ought* to be parsed into ['foo', 'bar', 'baz\nfrozz', 'bozz'] As far as I know, this is the UNIX convention, as used in (e.g.) /etc/passwd. Am I off target here? If the current behaviour is desirable (although I can't see why it should be) then at least I think there should be a way of implementing "normal" line continuations (as in my example), which is the standard UNIX behavior, and the behavior of Python source, for that matter. Otherwise, csv can't be used to parse (e.g.) /etc/passwd... And another thing: Perhaps a 'passwd' dialect could be added alongside 'excel'? Something like: class passwd(Dialect): delimiter = ':' doublequote = False escapechar = '\\' lineterminator = '\n' quotechar = '?' quoting = QUOTE_NONE skipinitialspace = False register_dialect("passwd", passwd) For some reason you *have* to supply a quotechar, even if you set QUOTE_NONE... I guess that's a bug too, in my book. If there are no objections, I might submit some of this as a bug report or two (or even a patch). -- Magnus Lie Hetland "The mind is not a vessel to be filled, http://hetland.org but a fire to be lighted." [Plutarch] |
Re: csv bugs
(A better place for this discussion would probably be csv@mail.mojam.com. I'm adding it to the cc list.) Magnus> It seems that when a line termination is escaped (using the Magnus> current escape character), csv.reader treats it as a line Magnus> continuation, which is well an good -- but it doesn't discard Magnus> the escape character; instead, it escapes it implicitly. This Magnus> seems like a bug to me. E.g. Magnus> foo:bar:baz\ Magnus> frozz:bozz Magnus> with separator ':' and escape character '\\' is parsed into Magnus> ['foo', 'bar', 'baz\\\nfrozz', 'bozz'] Magnus> In my opinion, it *ought* to be parsed into Magnus> ['foo', 'bar', 'baz\nfrozz', 'bozz'] Magnus> As far as I know, this is the UNIX convention, as used in (e.g.) Magnus> /etc/passwd. That may be, however development of the csv module's parser was driven by how Microsoft Excel behaves. The assumption was (rightly I think) that Excel reads or writes more CSV files than anything else. I don't believe it does anything with backslashes. Magnus> Am I off target here? If the current behaviour is desirable Magnus> (although I can't see why it should be) then at least I think Magnus> there should be a way of implementing "normal" line Magnus> continuations (as in my example), which is the standard UNIX Magnus> behavior, and the behavior of Python source, for that Magnus> matter. Otherwise, csv can't be used to parse (e.g.) Magnus> /etc/passwd... You're welcome to submit a patch. I don't have time for it. Magnus> And another thing: Perhaps a 'passwd' dialect could be added Magnus> alongside 'excel'? Something like: Magnus> class passwd(Dialect): Magnus> delimiter = ':' Magnus> doublequote = False Magnus> escapechar = '\\' Magnus> lineterminator = '\n' Magnus> quotechar = '?' Magnus> quoting = QUOTE_NONE Magnus> skipinitialspace = False Magnus> register_dialect("passwd", passwd) I'll take a look at that. Magnus> For some reason you *have* to supply a quotechar, even if you Magnus> set QUOTE_NONE... I guess that's a bug too, in my book. Maybe. Maybe just a feature. Magnus> If there are no objections, I might submit some of this as a bug Magnus> report or two (or even a patch). Please do. Skip |
Re: csv bugs
In <slrnc48oph.8ob.mlh@furu.idi.ntnu.no> Magnus Lie Hetland wrote:
> And another thing: Perhaps a 'passwd' dialect could be added alongside > 'excel'? Something like: I wanted this, and started to write it in Nov-2003, but because of bugs in csv, outlined in http://groups.google.com/groups?selm....supernews.com it is not possible to implement a passwd dialect, at least as of Python 2.3.2. Unless I missed something obvious. -- Francis Avila |
| All times are GMT. The time now is 08:59 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.