M
Magnus Lie Hetland
It seems that when a line termination is escaped (using the current
escape character), csv.reader treats it as a line continuation, which
is well an good -- but it doesn't discard the escape character;
instead, it escapes it implicitly. This seems like a bug to me. E.g.
foo:bar:baz\
frozz:bozz
with separator ':' and escape character '\\' is parsed into
['foo', 'bar', 'baz\\\nfrozz', 'bozz']
In my opinion, it *ought* to be parsed into
['foo', 'bar', 'baz\nfrozz', 'bozz']
As far as I know, this is the UNIX convention, as used in (e.g.)
/etc/passwd.
Am I off target here? If the current behaviour is desirable (although
I can't see why it should be) then at least I think there should be a
way of implementing "normal" line continuations (as in my example),
which is the standard UNIX behavior, and the behavior of Python
source, for that matter. Otherwise, csv can't be used to parse (e.g.)
/etc/passwd...
And another thing: Perhaps a 'passwd' dialect could be added alongside
'excel'? Something like:
class passwd(Dialect):
delimiter = ':'
doublequote = False
escapechar = '\\'
lineterminator = '\n'
quotechar = '?'
quoting = QUOTE_NONE
skipinitialspace = False
register_dialect("passwd", passwd)
For some reason you *have* to supply a quotechar, even if you set
QUOTE_NONE... I guess that's a bug too, in my book.
If there are no objections, I might submit some of this as a bug
report or two (or even a patch).
escape character), csv.reader treats it as a line continuation, which
is well an good -- but it doesn't discard the escape character;
instead, it escapes it implicitly. This seems like a bug to me. E.g.
foo:bar:baz\
frozz:bozz
with separator ':' and escape character '\\' is parsed into
['foo', 'bar', 'baz\\\nfrozz', 'bozz']
In my opinion, it *ought* to be parsed into
['foo', 'bar', 'baz\nfrozz', 'bozz']
As far as I know, this is the UNIX convention, as used in (e.g.)
/etc/passwd.
Am I off target here? If the current behaviour is desirable (although
I can't see why it should be) then at least I think there should be a
way of implementing "normal" line continuations (as in my example),
which is the standard UNIX behavior, and the behavior of Python
source, for that matter. Otherwise, csv can't be used to parse (e.g.)
/etc/passwd...
And another thing: Perhaps a 'passwd' dialect could be added alongside
'excel'? Something like:
class passwd(Dialect):
delimiter = ':'
doublequote = False
escapechar = '\\'
lineterminator = '\n'
quotechar = '?'
quoting = QUOTE_NONE
skipinitialspace = False
register_dialect("passwd", passwd)
For some reason you *have* to supply a quotechar, even if you set
QUOTE_NONE... I guess that's a bug too, in my book.
If there are no objections, I might submit some of this as a bug
report or two (or even a patch).