Parsing HTML with HTML::TableExtract

N

Ninja Li

Hi,

I am trying to a comma-delimited file by parsing HTML from the
website "http://www.earnings.com/conferencecall.asp?client=cb"
using HTML::TableExtract module (Thanks for Tad McClellan for the
introduction). However, I got the following error message when running
my script at the end of the post:
----------------------
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
HOGGF.PK
 ,HOGG ROBINSON GROUP PLC,Half- Year HOGG ROBINSON GROUP PLC
Earnings Conference Call,,,4:00 AM
................
----------------------

Also notice the large spaces between first value "HOGGF.PK" and
second "HOGG ROBINSON GROUP PLC". There are only a few spaces after
the first field in the original HTML. For what I could see so far, it
seems the empty values in the fields are not handled correctly. The
source code is at the end of the post.

Please advise the root cause and the fix.

Thanks in advance.

Nick

----------------------------------------------
Source code:

use warnings;
use strict;
use LWP::Simple;
use HTML::TableExtract;

my $html = get 'http://www.earnings.com/conferencecall.asp?
client=cb';

my @headers =
(
'SYMBOL',
'COMPANY',
'EVENT TITLE',
'WEBCAST',
'TRANSCRIPT',
'TIME'
);

my $te = HTML::TableExtract->new( headers => \@headers );
$te->parse($html);

foreach my $ts ( $te->tables )
{
foreach my $row ( $ts->rows )
{
my $csv = join ',', @$row;
print "$csv\n";
}
}
 
S

sln

Hi,

I am trying to a comma-delimited file by parsing HTML from the
website "http://www.earnings.com/conferencecall.asp?client=cb"
using HTML::TableExtract module (Thanks for Tad McClellan for the
introduction). However, I got the following error message when running
my script at the end of the post:
----------------------
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
HOGGF.PK
 ,HOGG ROBINSON GROUP PLC,Half- Year HOGG ROBINSON GROUP PLC
Earnings Conference Call,,,4:00 AM
...............
----------------------

Also notice the large spaces between first value "HOGGF.PK" and
second "HOGG ROBINSON GROUP PLC". There are only a few spaces after
the first field in the original HTML. For what I could see so far, it
seems the empty values in the fields are not handled correctly. The
source code is at the end of the post.

Please advise the root cause and the fix.

Thanks in advance.

Nick
What have you done to find out what caused this rediculous
number of warnings? Nothing from your code it seems.
Something is off, WAY off! Something wrong with your content or
headers. Have to learn the module, actually you have to read the docs
for it. Then, plan ahead. Look at the source of the html.

This is not rocket science.

-sln
 
M

Martien Verbruggen

Hi,

I am trying to a comma-delimited file by parsing HTML from the
website "http://www.earnings.com/conferencecall.asp?client=cb"
using HTML::TableExtract module (Thanks for Tad McClellan for the
introduction). However, I got the following error message when running
my script at the end of the post:
----------------------
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
Use of uninitialized value in join or string at conference.pl line 25.
HOGGF.PK
 ,HOGG ROBINSON GROUP PLC,Half- Year HOGG ROBINSON GROUP PLC
Earnings Conference Call,,,4:00 AM
...............

Tha is not the only output. I get more.
Also notice the large spaces between first value "HOGGF.PK" and
second "HOGG ROBINSON GROUP PLC". There are only a few spaces after
the first field in the original HTML. For what I could see so far, it

Check the 'original' HTML again. What's currently at that URL has the
spaces that you see. I guess they muct have changed it since you last
looked at it.
seems the empty values in the fields are not handled correctly. The
source code is at the end of the post.

Define 'correctly'. Or rather, find out what HTML::TableExtract defines
as correctly, and adjust your expectations to that. Cells without text
content seem to be returned as undefined values. It's your job to deal
with that in whichever way you think it should be dealt with.
Please advise the root cause and the fix.

If you want, I can send you a contract and rate card.

Martien
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top