M
Maqo
Is there any way to prevent HTML::TableExtract from mangling punctuation
in parsed text? For example, the below code is parsing “don’t come”
in the target URL as “don’t comeâ€. Is it something about the
document encoding, or a limitation of the module?
Many thanks!
------------------------------------------------------------
use LWP::Simple;
use HTML::TableExtract;
$URL =
"http://www.pimco.com/LeftNav/Late+Breaking+Commentary/IO/2005/IO+May-June+2005.htm";
$content = get($URL);
my $te = new HTML::TableExtract( depth=>1, count=>4, gridmap=>0,
keep_html=>1);
$te->parse($content);
foreach $ts ($te->table_states)
{
foreach $row ($ts->rows)
{
print $$row[0];
}
}
in parsed text? For example, the below code is parsing “don’t come”
in the target URL as “don’t comeâ€. Is it something about the
document encoding, or a limitation of the module?
Many thanks!
------------------------------------------------------------
use LWP::Simple;
use HTML::TableExtract;
$URL =
"http://www.pimco.com/LeftNav/Late+Breaking+Commentary/IO/2005/IO+May-June+2005.htm";
$content = get($URL);
my $te = new HTML::TableExtract( depth=>1, count=>4, gridmap=>0,
keep_html=>1);
$te->parse($content);
foreach $ts ($te->table_states)
{
foreach $row ($ts->rows)
{
print $$row[0];
}
}