Simple Structure Question

one man army · Jun 26, 2005

(I am new to this Language, Perl)

I have a small script running. Can someone explain this line?

while ( $token = $p->get_tag("a")) {
$url = $token->[1]{href} || "-";
...

$token is now a struct, with a first element that has a string, but
{href} refers to a pattern that may occur, which returns a true or false?

Can someone write a 2 line synonym that is not so terse?

thanks in advance

Gunnar Hjalmarsson · Jun 26, 2005

one said:
Can someone explain this line?

while ( $token = $p->get_tag("a")) {
$url = $token->[1]{href} || "-";
...

$token is now a struct, with a first element that has a string, but
{href} refers to a pattern that may occur, which returns a true or false?

Can someone write a 2 line synonym that is not so terse?

Even if not a synonym, these are two (command) lines:

perldoc perlref
perldoc perldsc

In other words, you seem to have some reading to do. ( $token->[1]{href}
tells us that $token is a reference to an "array of hashes". )

one man army · Jun 26, 2005

ok, I added this...

while ( $token = $p->get_tag("a")) {
print( "$token->[1] AND $token->[2]\n");

and I got

HASH(0x8dbd2c) AND ARRAY(0x8dbc84)

So now I am really confused. I don't even know how to look up the
structure of $token, and am confused by this syntax of apparently
extracting a hash entry with a key == "href".

Q. What file defines the structure of the return arg from get_tag()?
Parser.pm doesn't seem to have it at all...

Q. ok, so $token->[1] is a HASH. The ref I am looking at says

if ( exists $token("href") )
{ $url = "href"; }
else
{ $url = "-"; }

this is the synonym?

light shed is appreciated..

Gunnar Hjalmarsson · Jun 26, 2005

one said:
I don't even know how to look up the structure of $token,

use Data:

umper;
print Dumper $token;

and am confused by this syntax of apparently
extracting a hash entry with a key == "href".

That's where the reading come in, I guess. (See my reply to your first
message.)

Tassilo v. Parseval · Jun 26, 2005

Also sprach one man army:

ok, I added this...

while ( $token = $p->get_tag("a")) {
print( "$token->[1] AND $token->[2]\n");

and I got

HASH(0x8dbd2c) AND ARRAY(0x8dbc84)

So now I am really confused. I don't even know how to look up the
structure of $token, and am confused by this syntax of apparently
extracting a hash entry with a key == "href".

In addition to the documentation Gunnar pointed you to alreay, you can
also use Data:

umper to peek inside a data-structure:

use Data:

umper;

...
print Dumper $token;

This is useful for getting a feeling for nested data-structures and in
particular for the one you have to deal with.

Q. What file defines the structure of the return arg from get_tag()?
Parser.pm doesn't seem to have it at all...

It's either PullParser.pm or TokeParser.pm. However, I doubt that
looking inside these modules will provide you with that much input (for
now).

Q. ok, so $token->[1] is a HASH. The ref I am looking at says

Actually it's a hash reference.

if ( exists $token("href") )
{ $url = "href"; }
else
{ $url = "-"; }

this is the synonym?

I don't understand that questions. References don't usually say
anything.

Tassilo

A. Sinan Unur · Jun 26, 2005

ok, I added this...

I am not sure what you added 'this' to ...

while ( $token = $p->get_tag("a")) {
print( "$token->[1] AND $token->[2]\n");

and I got

HASH(0x8dbd2c) AND ARRAY(0x8dbc84)

So now I am really confused. I don't even know how to look up the
structure of $token, and am confused by this syntax of apparently
extracting a hash entry with a key == "href".

use Data:

umper;

print Dumper $token;

Q. What file defines the structure of the return arg from get_tag()?
Parser.pm doesn't seem to have it at all...

Well, you have the source code for the whole program in front of you,
there should be an appropriate use statement somewhere from which you
can glean this. On the other hand,

<URL:http://www.google.com/search?q=perl+html+parser+get_tag>

It is not so hard, is it?

Q. ok, so $token->[1] is a HASH.

It is a reference to a hash.

The ref I am looking at says

if ( exists $token("href") )

That, on the other hand, is a syntax error.

{ $url = "href"; }
else
{ $url = "-"; }

this is the synonym?

I don't know what you mean.

Sinan

Sherm Pendley · Jun 26, 2005

one man army said:
ok, I added this...

while ( $token = $p->get_tag("a")) {
print( "$token->[1] AND $token->[2]\n");

and I got

HASH(0x8dbd2c) AND ARRAY(0x8dbc84)

So now I am really confused. I don't even know how to look up the
structure of $token

Data:

umper is your friend.

use Data:

umper; # at the top of your script

# ... later ...

print Dumper($token);

sherm--

Sherm Pendley · Jun 26, 2005

Sherm Pendley said:
Data:umper is your friend.

Note to self - read replies *before* posting. That didn't really need to be
said a *fourth* time. :-(

sherm--

Tad McClellan · Jun 27, 2005

Sherm Pendley said:
Note to self - read replies *before* posting. That didn't really need to be
said a *fourth* time. :-(

That's the sort of thing that can happen when the OP asks a FAQ. :-(

perldoc -q struct

How do I print out or copy a recursive data structure?

The Data:

umper module on CPAN (or the 5.005 release of Perl)
is great for printing out data structures.

Tad McClellan · Jun 27, 2005

one man army said:
(I am new to this Language, Perl)

(you should still check the Perl FAQ before posting)

while ( $token = $p->get_tag("a")) {
$url = $token->[1]{href} || "-";
...

$token is now a struct,

perldoc -q struct

How can I make the Perl equivalent of a C structure/C++ class/hash or
array of hashes or arrays?

Then try,

perldoc perlreftut
perldoc perlref
perldoc perllol
perldoc perldsc

one man army · Jun 27, 2005

You know why All Five of you men are so cool, because You Read The
Manual! All 800+ pages of gnarly, dense, encoded hard-core Perl. You
have worked so hard at reading the manual, that you cannot wait to leap
out, virutally in unison, with a shouts of "Read the Manual!"

I admit it, after 2 straight 15 hour plus research days, on a completely
unrelated topic, I though I could get a Perl program running to just
read the contents of 840 links on a web page. I read the manual, I went
to CPAN, I installed the module, and it worked! That's why you guys are
worth it, 'cause your language does work.

But in the Beginners Perl Book I have, there is no discussion of this
terse syntax I encountered, so yes, I asked for someone to help and
explain it. It was weak, there were _more manuals_ I could have read
first.

The post that is really the indicator of them all I think is Tassilo's
with the .sig that says "use bigint;
$n=71423350343770280161397026330337371139054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);"

I have worked as a C programmer for 15 years. I have written and shipped
and rev'ed huge commercial applications. You will never, ever, ever find
code like that in any file, ever. It is considered Bad Practice, for
good reasons.

This Perl group is absolutely reflective of the traditional UNIX
culture, dense, terse, sup smart CRYPTO code. If you can't read it, what
are you doing here - you have manuals to read!!

I am going to have a look at arrays of hashes actually, because I really
do want to be able to read this web page. I got a good day/night of
rest. But I don't have months to just study Perl. So I thought if
someone wanted to try to explain the construct, they could. The post was
hasty, I admit it. Thank you for the DataDumper pointer. I am used to a
header file with a struct definition, but I am not there in Perl.

Gunnar Hjalmarsson · Jun 27, 2005

one said:
I am going to have a look at arrays of hashes actually, because I really
do want to be able to read this web page. I got a good day/night of
rest.

Good.

But I don't have months to just study Perl. So I thought if
someone wanted to try to explain the construct, they could.

To understand the construct, you need to understand the concept of
nested data structures in Perl, and nobody will explain that concept
better than perldoc in a reply to your post.

Tad McClellan · Jun 27, 2005

one man army said:
You know why All Five of you men are so cool, because You Read The
Manual!

Thanks, now that we have affirmation from you, our lives are complete.

I admit it, after 2 straight 15 hour plus research days, on a completely
unrelated topic,

If it was unrelated, then why do you bring it up?

That's why you guys are
worth it, 'cause your language does work.

Yeah, right.

But in the Beginners Perl Book I have, there is no discussion of this
terse syntax I encountered,

I fail to see how a deficiency in the book you've chosen is relevant.

The post that is really the indicator of them all I think is Tassilo's
with the .sig that says "use bigint;
$n=71423350343770280161397026330337371139054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);"

I have worked as a C programmer for 15 years. I have written and shipped
and rev'ed huge commercial applications.

Wow, you da man!

You will never, ever, ever find
code like that in any file, ever. It is considered Bad Practice, for
good reasons.

You don't seem to be able to tell playing around from production code.

But I don't have months to just study Perl.

That's fortunate, since a couple of hours would be enough to
accomplish your task.

The post was
hasty, I admit it.

Then why all the whining?

So long!

one man army · Jun 27, 2005

ok, so it looked like perhaps HTML::LinkExtractor was going to be easier
to use. So I run a simple loop, as per the examples. I get a Hash? Hash
ref? back in each iteration. DataDumper shows something like (below). By
Reading the Manual, I infer to choose only a $ThisLink with an 'href'
entry, I use
if ( exists $ThisLink{ href } ) { ... }

But I always get back an UNDEF for every iteration... I tried
@$ThisLink{ href} just for kicks, but that doesn't work either... Any
insights to get me past this sticking point?

thanks in advance

##--
my $LX = new HTML::LinkExtractor();
$LX->parse( $FileName);

for $ThisLink( @{ $LX->links } ) {
print "ThisLink = " . Dumper( $ThisLink );
if ( exists $ThisLink{ href } ) {
print "LinkInner = " . $ThisLink{ href } . "\n";
}
}

ThisLink = $VAR1 = {
'content' => 'text/html; charset=iso-8859-1',
'tag' => 'meta',
'http-equiv' => 'Content-Type'
};
ThisLink = $VAR1 = {
'content' => 'Weeds Lawn Lawn Care',
'tag' => 'meta',
'name' => 'keywords'
};
ThisLink = $VAR1 = {
'alt' => '',
'src' => '/images/ERHeader.jpg',
'border' => '0',
'width' => '780',
'tag' => 'img',
'height' => '185',
'usemap' => '#header'
};
ThisLink = $VAR1 = {
'_TEXT' => '<a name="1" href="http://www.Ortho.com/Registration/OrgLocationList.aspx?id=20822">1A Plenty Grow</a>',
'href' =>
'http://www.Ortho.com/Registration/OrgLocationList.aspx?id=20822',
'tag' => 'a',
'name' => '1'
};

Fabian Pilkowski · Jun 28, 2005

* one man army said:
ok, so it looked like perhaps HTML::LinkExtractor was going to be easier
to use.

Please quote a bit of context like all others do.

So I run a simple loop, as per the examples. I get a Hash? Hash
ref? back in each iteration. DataDumper shows something like (below).

You get a hash ref.

By
Reading the Manual, I infer to choose only a $ThisLink with an 'href'
entry, I use
if ( exists $ThisLink{ href } ) { ... }

You're trying to access the value for 'href' in the hash %ThisLink. But
there is no hash like this. You have to try Perl's dereference operator.

if ( exists $ThisLink->{href} ) { ... }

Btw, if you're using "use strict" in your skripts, Perl will tell you
whenever you try to use a non-existent var. This is what everyone should
use. Nevertheless, you have to know how to access data structures you
could print with Data:

umper. And that's why others have pointed you to
the manual where this is actually described (in `perldoc perlreftut` as
well as in `perldoc perlref`).

regards,
fabian

Tassilo v. Parseval · Jun 28, 2005

Also sprach one man army:

You know why All Five of you men are so cool, because You Read The
Manual! All 800+ pages of gnarly, dense, encoded hard-core Perl. You
have worked so hard at reading the manual, that you cannot wait to leap
out, virutally in unison, with a shouts of "Read the Manual!"

I never quite understood why some people get so antagonized by pointers
to the reference documentation. Parts of it are actually quite well
written with beginners in mind. I learnt Perl by using two sources
of information: A very cursory reference found in "Linux in a Nutshell"
(which only lists all Perl language constructs in tables without any
explanations) and the standard Perl documentation. Obviously, it
makes little sense to read the latter from the start to the end
(especially since it has no marked entry-point). Rather read only those
parts that concern you and your problem. Those were the ones you were
pointed to here. Concidering the vast amount of Perl manpages (134 as of
5.8.4), this is quite a valuable information not easy deducible for a
beginner.

I admit it, after 2 straight 15 hour plus research days, on a completely
unrelated topic, I though I could get a Perl program running to just
read the contents of 840 links on a web page. I read the manual, I went
to CPAN, I installed the module, and it worked! That's why you guys are
worth it, 'cause your language does work.

But in the Beginners Perl Book I have, there is no discussion of this
terse syntax I encountered, so yes, I asked for someone to help and
explain it. It was weak, there were _more manuals_ I could have read
first.

Not more manuals but rather the relevant ones.

The post that is really the indicator of them all I think is Tassilo's
with the .sig that says "use bigint;
$n=71423350343770280161397026330337371139054411854220053437565440;
$m=-8,;;$_=$n&(0xff)<<$m,,$_>>=$m,,print+chr,,while(($m+=8)<=200);"

I have worked as a C programmer for 15 years. I have written and shipped
and rev'ed huge commercial applications. You will never, ever, ever find
code like that in any file, ever. It is considered Bad Practice, for
good reasons.

As is the obfuscated C contest with which you should be familiar
considering your 15 years of C experience. These contests as well as
obfuscated signatures are nothing else than a self-ironic view on some
of the shortcomings certain languages have, namely the tendency towards
a very terse and at times even cryptic syntax. It should be out of the
question that anyone (me anyway) would ever allow lines like that to slip
into production code.

This Perl group is absolutely reflective of the traditional UNIX
culture, dense, terse, sup smart CRYPTO code. If you can't read it, what
are you doing here - you have manuals to read!!

Why, we already have read them. ;-)

I am going to have a look at arrays of hashes actually, because I really
do want to be able to read this web page. I got a good day/night of
rest. But I don't have months to just study Perl. So I thought if
someone wanted to try to explain the construct, they could. The post was
hasty, I admit it. Thank you for the DataDumper pointer. I am used to a
header file with a struct definition, but I am not there in Perl.

There might not even be such a definition in Perl. All Perl
data-structures grow as needed. New key/value pairs may always be added
to a hash so the way of working in Perl is quite different from the way
you may have learnt in C where you usually have a couple of other editor
sessions in order to see some headers.

Tassilo

Tassilo v. Parseval · Jun 28, 2005

Also sprach one man army:

ok, so it looked like perhaps HTML::LinkExtractor was going to be easier
to use. So I run a simple loop, as per the examples. I get a Hash? Hash
ref? back in each iteration. DataDumper shows something like (below). By
Reading the Manual, I infer to choose only a $ThisLink with an 'href'
entry, I use
if ( exists $ThisLink{ href } ) { ... }

But I always get back an UNDEF for every iteration... I tried
@$ThisLink{ href} just for kicks, but that doesn't work either... Any
insights to get me past this sticking point?

Throwing guess-work at the Perl interpreter is not likely to work out on
the long run. What you need is to familiarize yourself with a few
fundamental concepts behind references. You must have gone through a
similar process in the past with structures and pointer to structures in
C. The good news is that references in Perl are easier to understand so
you will need much less time for those as for the related C concepts.

thanks in advance

##--
my $LX = new HTML::LinkExtractor();
$LX->parse( $FileName);

for $ThisLink( @{ $LX->links } ) {
print "ThisLink = " . Dumper( $ThisLink );
if ( exists $ThisLink{ href } ) {
print "LinkInner = " . $ThisLink{ href } . "\n";
}
}

The first thing to notice is that Dumper() doesn't work on hashes or
arrays but on references to them. The C analogy:

struct foo a;

Dumper(a); /* wrong */
Dumper(&a); /* right */

Subsequently, if you have a plain hash (indicated by the sigil '%' in
front of the variable name instead of '$', that is: %ThisLink):

print Dumper(%ThisLink); # wrong
print Dumper(\%ThisLink); # right

So Perl's '\' is roughly equivalent to C's '&'. Only roughly because a
Perl reference is not a memory-address as returned by C's '&'.

ThisLink = $VAR1 = {
'content' => 'text/html; charset=iso-8859-1',
'tag' => 'meta',
'http-equiv' => 'Content-Type'
};

Again many indications that you are not dealing with a plain hash. The
curly braces '{' and '}' are a hash-reference constructor. '[' and ']'
are the constructor for array-references and plain parens '(' and ')'
can be regarded as list-constructor (technically not quite correct but
good enough):

my $hash_ref = { key => 'value' };
my %hash = ( key => 'value' );
my $ary_ref = [ 1, 2, 3 ];
my @ary = ( 1, 2, 3 );

The nice thing about reference types in Perl is that accessing single
elements is similar to how it is done in C, i.e. by using the '->'
operator:

typedef struct {
int a;
} foo;

...

foo *f = return_foo_struct(...);

printf("%i\n", f->a);

In Perl:

my $f = { a => 1 };
print $f->{a};

If $f was an array-reference:

my $f = [ 1, 2, 3 ];
print $f->[0];

If you have non-reference arrays or hashes, the arrow is simply dropped:

my %f = ...;
print $f{a};

my @f = ...;
print $f[0];

The '$' sigil in the subscript expressions is not a mistake. It means
that the whole expression (both $f{a} and $f[0]) is a scalar.

This is most likely all you need to know for your problem at hand. It
is, however, not all about references. There are ways to turn hash- and
array-references back into plain hashes and arrays. This is all layed
out nicely in the documentation that you should feel less compelled
about to read. Aside from the already mentioned ones 'perlreftut' will
give you three simple rules that will help you in every situation a
reference shows up somehow.

Tassilo

Joe Smith · Jun 28, 2005

one said:
Can someone explain this line?

while ( $token = $p->get_tag("a")) {
$url = $token->[1]{href} || "-";

$token is now a struct,

You should use the proper vocabulary. $token is a reference
to an array of hashes.

with a first element that has a string,

The first element may or may not be a string, since you haven't
shown how $token->[0] is used. (First = 0, not 1.)

but {href} refers to a pattern that may occur, which returns a true or false?

No, 'href' is a key into the %{$token->[1]} hash. If the value
corresponding to the keys is undefined or zero or "0", then the
value of "-" will be used.

Can someone write a 2 line synonym that is not so terse?

Skipping the while() part:
$ref_to_array_of_hashes = class_of_p::get_tag('a');
@array_of_hashes = @{$ref_to_array_of_hashes};
$ref_to_hash = $array_of_hashes[1];
%hash = %{$ref_to_hash};
$url = $hash{'href'};
$url = '-' unless $url;

-Joe

Tad McClellan · Jun 28, 2005

Joe Smith said:
one said:

Can someone explain this line?

Click to expand...

$url = $token->[1]{href} || "-";

Click to expand...

If the value
corresponding to the keys is undefined or zero or "0",

Or the empty string. (you missed one of the four false values)

one man army · Jun 29, 2005

ok, after stumbling badly on the syntax and usage, as well as time and
patience, I have written the first script..

thanks for (most of) the replies, sometimes it is useful to have more
sets of eyeballs.

I really don't think some of the posters remember their first Hundred
Hours of Perl programming, it was so long ago, and past by some many
more..

There are a few idioms I am working on, but basically, I have a script
that parses, finds links, get()s and so on for three nested dereferences
in about 900 DB generated links. It kinda clunks along, and works!!

I am going to rewrite it in a cleaner fashion.

--
Lessons Learned:

Perl programmers use Dump() to determine struct and hash contents -
program off of what is in front of you

References, like C pointers, come up a lot and matter

print() is your friend

the documentation is deficient in many places. Most notably
perldoc -q struct

There is a lot of terse idiom everywhere you look

Perl has a lot of modules on CPAN to do very useful stuff

all for now

Can't solve problems! please Help	0	Sep 26, 2022
Simple (Rookie) Question	5	Oct 28, 2012
How do I follow links stored in an array?	3	Apr 29, 2008
simple_html_dom: simple use-case - getting a scipt to work	0	Mar 2, 2020
Crawl nested data structure, apply code block to each	10	Apr 13, 2014
import array like structure using perl	2	Dec 26, 2012
Only one table shows up with the information	2	Mar 29, 2023
Buttons lining up	1	Feb 6, 2022

Simple Structure Question

one man army

Gunnar Hjalmarsson

one man army

Gunnar Hjalmarsson

Tassilo v. Parseval

A. Sinan Unur

Sherm Pendley

Sherm Pendley

Tad McClellan

Tad McClellan

one man army

Gunnar Hjalmarsson

Tad McClellan

one man army

Fabian Pilkowski

Tassilo v. Parseval

Tassilo v. Parseval

Joe Smith

Tad McClellan

one man army

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads