Confused about Schwartz idiom utilizing map & split

weston · Mar 3, 2006

In an article on Stonehenge.com on using libxml2 to strip html from a
document, I came across a part of the listing that I'm having trouble
understanding. Randall apparently creates a hash of approved tags and
their attributes with these lines:

=9= my %PERMITTED =
=10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
=11= split /\n/, <<'END';
=12= a href name target class title
=13= b
=14= big
=15= blockquote class
....
=49= END

(See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

I keep trying to parse line 10 in my head and am not getting a lot of
mental traction in really understanding how this works. Anybody want to
help?

Dr.Ruud · Mar 4, 2006

weston schreef:

In an article on Stonehenge.com on using libxml2 to strip html from a
document, I came across a part of the listing that I'm having trouble
understanding. Randall apparently creates a hash of approved tags and
their attributes with these lines:

=9= my %PERMITTED =
=10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
=11= split /\n/, <<'END';
=12= a href name target class title
=13= b
=14= big
=15= blockquote class
....
=49= END

(See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

I keep trying to parse line 10 in my head and am not getting a lot of
mental traction in really understanding how this works. Anybody want
to help?

Maybe this helps:

#!/usr/bin/perl
use strict; use warnings;
use Data:

umper;

my %PERMITTED =
map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
split /\n/, <<'END';
a href name target class title
b
big
blockquote class
....
END

print Data:

umper->Dump( [\%PERMITTED]
, [qw(%PERMITTED)]
), "\n";

Randal L. Schwartz · Mar 4, 2006

weston> In an article on Stonehenge.com on using libxml2 to strip html from a
weston> document, I came across a part of the listing that I'm having trouble
weston> understanding. Randall apparently creates a hash of approved tags and
weston> their attributes with these lines:

weston> =9= my %PERMITTED =
weston> =10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
weston> =11= split /\n/, <<'END';
weston> =12= a href name target class title
weston> =13= b
weston> =14= big
weston> =15= blockquote class
weston> ....
weston> =49= END

weston> (See http://www.stonehenge.com/merlyn/PerlJournal/col02.html )

weston> I keep trying to parse line 10 in my head and am not getting a lot of
weston> mental traction in really understanding how this works. Anybody want to
weston> help?

Heh.

The split on line 11 creates elements like:

"a href name target class title",
"b",
"big",
"blockquote class",

etc. The map on the beginning of line 10 sets $_ equal to each of those,
and looks for a list-valued return from the block.

The split in the middle of line 10 breaks each of those elements listed above
into a list, and assigns the first to $k, and any remaining ones to @v.

The second map on line 10 converts @v to a list of elements of @v alternating
with the value "1", and then turns that into a hashref, so that @v becomes
keys, with values 1. That hashref is then added along with $k to be
two values that eventually contribute to %PERMITTED.

But didn't I say all this in the article?

print "Just another Perl hacker,"; # the original
--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<[email protected]> <URL:http://www.stonehenge.com/merlyn/>
Perl/Unix/security consulting, Technical writing, Comedy, etc. etc.
See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training!
*** Free account sponsored by SecureIX.com ***
*** Encrypt your Internet usage with a free VPN account from http://www.SecureIX.com ***

Tad McClellan · Mar 4, 2006

weston said:
In an article on Stonehenge.com on using libxml2 to strip html from a
document, I came across a part of the listing that I'm having trouble
understanding. Randall apparently creates a hash of approved tags and
their attributes with these lines:

=10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }

I keep trying to parse line 10 in my head and am not getting a lot of
mental traction in really understanding how this works. Anybody want to
help?

Does this help?

------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Data:

umper;

my %PERMITTED =
map { my($k, @v) = split; # 1st space-sep'd field is tag, rest are its attrs
($k, {map {$_, 1} @v}) # a 2-element list. 1st is tag,
# 2nd is a hash-ref with keys as attr names,
# and values set to one
}
split /\n/, <<'END';
a href name target class title
b
big
blockquote class
END

print Dumper \%PERMITTED;
------------------------------

Or maybe it would help to "unroll" the maps into foreachs:

------------------------------
#!/usr/bin/perl
use warnings;
use strict;
use Data:

umper;

my %PERMITTED;

foreach (split /\n/, <<'END')
a href name target class title
b
big
blockquote class
END
{
my($k, @v) = split;
my %h;
foreach ( @v ) { # "unroll" {map {$_, 1} @v
$h{$_} = 1;
}
$PERMITTED{$k} = \%h;
}

print Dumper \%PERMITTED;

Anno Siegel · Mar 4, 2006

weston said:
In an article on Stonehenge.com on using libxml2 to strip html from a
document, I came across a part of the listing that I'm having trouble
understanding. Randall apparently creates a hash of approved tags and

Who is this Randall you speak of?

their attributes with these lines:

Randal's code constructs a hash of hashes. The first word in each data
line is a primary key. The rest of the words in each line (if any)
become the keys of an inner hash, all with the value 1. Presumably
the inner hash represents a set of whatever, associated with the primary
key.

=9= my %PERMITTED =
=10= map { my($k, @v) = split; ($k, {map {$_, 1} @v}) }
=11= split /\n/, <<'END';
=12= a href name target class title
=13= b
=14= big
=15= blockquote class
....
=49= END

How does it do that? Rewriting the code with fewer map's and more
variable names may help. (untested)

my @lines = split /\n/, <<'END';
a href name target class title
b
big
blockquote class
END

my %PERMITTED;

for my $line ( @lines ) {
my ($primary_key, @words) = split; # ($k, @v) in the original code
# build wordlist
my @wordlist; # alternating one word and one 1 (for hash initialization)
for my $word ( @v ) {
push @wordlist, ( $word => 1);
}
# build a hash out of @wordlist and assign it to its place
$PERMITTED{ $k} = { @wordlist};
}

I keep trying to parse line 10 in my head and am not getting a lot of
mental traction in really understanding how this works. Anybody want to
help?

Line 10 does basically what the (outer) for-loop does in my code. The
inner for-loop does the job of the nested map.

Randal's code is that of a fluent speaker of Perl. Its parts (the two map's)
are two well-known idioms for hash-building. Applied together, they may
look like a mess, but once you recognize the pattern of each their
interaction becomes clear too.

Anno

Dr.Ruud · Mar 4, 2006

Tad McClellan schreef:

print Dumper \%PERMITTED;

Alternative:

print Data:

umper->Dump( [\%var], ['%var'] );

My own handy Pocket Reference notes (please share your own also)	17	Aug 12, 2004
Prototype 1.6--Somebody Stop These People	6	Dec 24, 2009
My own handy Pocket Reference notes	15	Aug 17, 2004
[QUIZ SOLUTION] Happy Numbers (#93)	3	Sep 6, 2006
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	15	Apr 1, 2006
Can't make this page work	6	Mar 8, 2006
fprintf slower than printf and redirect?	1	Nov 29, 2008
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004

Confused about Schwartz idiom utilizing map & split

weston

Dr.Ruud

Randal L. Schwartz

Tad McClellan

Anno Siegel

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads