quote causing DBD::CSV failure

M

Mark

(I don't seem to be able to prohibit the data portion below from
line-wrapping -- Sorry!)

Below is a Perl script followed by the data it is reading (to be in a
file called "./data/location". The data portion has a line used as
column headings, followed by 10 lines of data. The 9th line of data has
some double quotes in one of the fields. This ceases the reading of the
data at line 8. I have no idea why. I tried adding csv_escape => '"'
and csv_quote => '"' to the connect function, but these seem to have no
effect. If I remove the quote from the file, all 10 lines are read and
printed. In fact, all 11,000 lines of the full-length version of
location will be read in and printed, but the quotes halt it wherever
they are encountered. I'm stupified! Please offer helpful suggestions
in this regard. (DBD::CSV 0.22) (SQL::Statement 1.14), and
(Text::CSV_XS 0.23)

Mark

______________________________________________________________________
#!/usr/bin/perl -w

use strict;
use DBI;

my $dbstr = 'DBI:CSV:f_dir=data';
my $dbh = DBI->connect($dbstr,undef,undef,
{ChopBlanks => 1,PrintWarn => 1,
PrintError => 0,RaiseError => 1,
csv_eol => "\n",csv_sep_char => '|'}
);

my $sql = "SELECT * FROM location ORDER BY lid";
my $sth = $dbh->prepare("$sql");

if(defined($sth)) {
$sth->execute();
}

my $data = $sth->fetchall_arrayref;
$sth->finish();
$dbh->disconnect();

if(defined $data) {
print "\$data is defined. Number of records = " .
scalar @$data . "\n"; }
else { print "\$data is undefined\n"; }

my $i = 0;
foreach (@$data) {
print "$i: " . (join ', ',@$_) . "\n";
$i++;
}

exit;
__END__

lid|county|coe|cpm|detail|elev|hdatum|hsa|hu|lat|lon|lremark|lrevise|name|network|rb|rfc|sbd|sn|state|waro|wfo|wsfo|type|des|det|post|stntype|tzone
06C|Unk|||at|0||LOT||41.9844444444444|88.0980555555556|||Chicago/Schaumburg|Unk||Unk|||IL||LOT|||||1||CST6CDT
GNCK2|Gallatin|||at|0||ILN||38.705|84.8238888888889|Eagle
Creek|2005-03-02|Glencoe|Unk|Ohio River
(OHL)|OHRFC|||KY||ILN|||GLENCOE|Eagle Creek|1|3000|EST5EDT
WNSI2|Shelby|||7
SE|0||ILX||39.3594444444444|88.5058333333333||2005-05-11|Windsor|Unk|Little
Wabash (LWA)|OHRFC|||IL||ILX|||WINDSOR 7SE||1|9000|CST6CDT
LNBO1|Washington|||at|0||RLX||39.5630555555556|81.2038888888889|Little
Muskingum River at Bloomfield, OH.||Bloomfield|Unk|Huntington Segment -
Ohio|OHRFC|||OH||RLX|||Bloomfield||1||EST5EDT
07S|Unk|||at|||Unk||47.9667|117.433||||Unk||Unk|||XX||Unk|||||0|ASOS|EST5EDT
CLE1412|Ashtabula||||||CLE||41.53|80.53|PAUL MECHLING,
440-577-1111||PIERPONT 2SE|Unk||OHRFC|||OH||CLE|||||1||EST5EDT
CLE1414|Ashtabula||||||CLE||41.85|80.8|COOP,
n/a||ASHTABULA|Unk||OHRFC|||OH||CLE|||||1||EST5EDT
CLE1415|Ashtabula||||||CLE||41.68|80.67|COOP,
n/a||DORSET|Unk||OHRFC|||OH||CLE|||||1||EST5EDT
OPKW2|Monongalia|||1
N|860||PBZ||39.5644444444444|80.0491666666667|Monongahela River at
Opekiska Lock|2005-02-10|Opekiska|Unk|Monongahela
River|OHRFC|||WV||PBZ|||OPEKISKA 1N|39d33'52" -80d02'57"|1|3000|EST5EDT
22G|Unk|||at|0||CLE||41.3441666666667|82.1777777777778|||Lorain/Elyria|Unk||OHRFC|||OH||CLE|||||1|ASOS|EST5EDT
 
D

DJ Stunks

Mark said:
(I don't seem to be able to prohibit the data portion below from
line-wrapping -- Sorry!)

Below is a Perl script followed by the data it is reading (to be in a
file called "./data/location". The data portion has a line used as
column headings, followed by 10 lines of data. The 9th line of data has
some double quotes in one of the fields. This ceases the reading of the
data at line 8. I have no idea why. I tried adding csv_escape => '"'
and csv_quote => '"' to the connect function, but these seem to have no
effect. If I remove the quote from the file, all 10 lines are read and
printed. In fact, all 11,000 lines of the full-length version of
location will be read in and printed, but the quotes halt it wherever
they are encountered. I'm stupified! Please offer helpful suggestions
in this regard. (DBD::CSV 0.22) (SQL::Statement 1.14), and
(Text::CSV_XS 0.23)
River|OHRFC|||WV||PBZ|||OPEKISKA 1N|39d33'52" -80d02'57"|1|3000|EST5EDT

these aren't really quotes in that they aren't quoting a field (with
spaces in it for instance), which is why csv_quote didn't work. they
aren't escaping anything either :)

the problem, as I see it, is that Text::CSV has already decided that
that field should be a double field from your previous lines in which
your lat/long was expressed as a decimal number.

you may be able to coerce DBD::CSV into making that column of type
string (via Text::CSV_XS::pV() ) or you could set RaiseError => 0 to
ignore this line and continue with the rest of your file.

the only other suggestions I would have is to either preparse the file
to convert HMS lat/longs to dec, or have whichever system gave you the
HMS lat/long send it to you in decimal instead.

HTH,
-jp
 
M

Mark

the problem, as I see it, is that Text::CSV has already decided that
that field should be a double field from your previous lines in which
your lat/long was expressed as a decimal number.

you may be able to coerce DBD::CSV into making that column of type
string (via Text::CSV_XS::pV() ) or you could set RaiseError => 0 to
ignore this line and continue with the rest of your file.

the only other suggestions I would have is to either preparse the file
to convert HMS lat/longs to dec, or have whichever system gave you the
HMS lat/long send it to you in decimal instead.

I don't want to skip the record because this would yield only a subset
of the records. I need to deal with EVERY record.

This issue isn't DMS vs DD. I didn't mean to mislead you or anyone
else. The field containing the quotes is really just a remarks (text)
column. I put in the degree-minutes-seconds remark just to show a valid
use of double-quotes without it really quoting anything. Something like
George "Babe" Ruth would have also killed the reading of the data, even
though CSV already thinks the field is of type string -- I presume.

This raises an interesting point, though... Is there a way to explicitly
set what each field type is so CSV doesn't have to "decide" this?
Perhaps digits in a text field or NULLs in a numeric field on the first
line of data has lead CSV to the wrong conclusions.

Thoughts?

Mark
 
M

Mark

Thoughts?
Never mind. I found the answer within the Text::CSV_XS documentation:
"A field within CSV must be surrounded by double-quotes to contain an
embedded double-quote, represented by a pair of consecutive double-quotes."

Once I replaced each " with "" and put double quotes around the whole
field, it worked!

Mark
 
X

xhoster

Mark said:
(I don't seem to be able to prohibit the data portion below from
line-wrapping -- Sorry!)

Below is a Perl script followed by the data it is reading (to be in a
file called "./data/location". The data portion has a line used as
column headings, followed by 10 lines of data. The 9th line of data has
some double quotes in one of the fields. This ceases the reading of the
data at line 8. I have no idea why. I tried adding csv_escape => '"'
and csv_quote => '"' to the connect function, but these seem to have no
effect.

Those are already the default values, so of course that has no effect.
What you need to do *unset* them, as the " character is, in your data, not
used to escape thing and is not used to quote things.
my $dbstr = 'DBI:CSV:f_dir=data';

my $dbstr = 'DBI:CSV:f_dir=data;csv_quote_char=;csv_escape_char=';

Xho
 
M

Mark

my $dbstr = 'DBI:CSV:f_dir=data;csv_quote_char=;csv_escape_char=';

Actually, I like your solution better because I do not have to
preprocess my file to change " into "".

Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,842
Latest member
MableIwk73

Latest Threads

Top