Also sprach one man army:
ok, so it looked like perhaps HTML::LinkExtractor was going to be easier
to use. So I run a simple loop, as per the examples. I get a Hash? Hash
ref? back in each iteration. DataDumper shows something like (below). By
Reading the Manual, I infer to choose only a $ThisLink with an 'href'
entry, I use
if ( exists $ThisLink{ href } ) { ... }
But I always get back an UNDEF for every iteration... I tried
@$ThisLink{ href} just for kicks, but that doesn't work either... Any
insights to get me past this sticking point?
Throwing guess-work at the Perl interpreter is not likely to work out on
the long run. What you need is to familiarize yourself with a few
fundamental concepts behind references. You must have gone through a
similar process in the past with structures and pointer to structures in
C. The good news is that references in Perl are easier to understand so
you will need much less time for those as for the related C concepts.
thanks in advance
##--
my $LX = new HTML::LinkExtractor();
$LX->parse( $FileName);
for $ThisLink( @{ $LX->links } ) {
print "ThisLink = " . Dumper( $ThisLink );
if ( exists $ThisLink{ href } ) {
print "LinkInner = " . $ThisLink{ href } . "\n";
}
}
The first thing to notice is that Dumper() doesn't work on hashes or
arrays but on references to them. The C analogy:
struct foo a;
Dumper(a); /* wrong */
Dumper(&a); /* right */
Subsequently, if you have a plain hash (indicated by the sigil '%' in
front of the variable name instead of '$', that is: %ThisLink):
print Dumper(%ThisLink); # wrong
print Dumper(\%ThisLink); # right
So Perl's '\' is roughly equivalent to C's '&'. Only roughly because a
Perl reference is not a memory-address as returned by C's '&'.
ThisLink = $VAR1 = {
'content' => 'text/html; charset=iso-8859-1',
'tag' => 'meta',
'http-equiv' => 'Content-Type'
};
Again many indications that you are not dealing with a plain hash. The
curly braces '{' and '}' are a hash-reference constructor. '[' and ']'
are the constructor for array-references and plain parens '(' and ')'
can be regarded as list-constructor (technically not quite correct but
good enough):
my $hash_ref = { key => 'value' };
my %hash = ( key => 'value' );
my $ary_ref = [ 1, 2, 3 ];
my @ary = ( 1, 2, 3 );
The nice thing about reference types in Perl is that accessing single
elements is similar to how it is done in C, i.e. by using the '->'
operator:
typedef struct {
int a;
} foo;
...
foo *f = return_foo_struct(...);
printf("%i\n", f->a);
In Perl:
my $f = { a => 1 };
print $f->{a};
If $f was an array-reference:
my $f = [ 1, 2, 3 ];
print $f->[0];
If you have non-reference arrays or hashes, the arrow is simply dropped:
my %f = ...;
print $f{a};
my @f = ...;
print $f[0];
The '$' sigil in the subscript expressions is not a mistake. It means
that the whole expression (both $f{a} and $f[0]) is a scalar.
This is most likely all you need to know for your problem at hand. It
is, however, not all about references. There are ways to turn hash- and
array-references back into plain hashes and arrays. This is all layed
out nicely in the documentation that you should feel less compelled
about to read. Aside from the already mentioned ones 'perlreftut' will
give you three simple rules that will help you in every situation a
reference shows up somehow.
Tassilo