G
Geoff Cox
Also sprach Geoff Cox:
Tassilo,
many thanks for the corrections - will sort it out!
Cheers
Geoff
Tassilo
have used your code and my version works now! You will see that I have
extended it to work for <p> and that too works.
Glad to hear it.
I am not clear why the following line appears in the start, end and
text sub. I would have thought it would appear just once...I am not
following the logic??
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
I don't think the above should appear like that in all three callbacks.
Cheers
Geoff
my $in_heading;
my $p;
sub start {
my ($self, $tagname, $attr, undef, undef, $origtext) = @_;
There is one undef too many. That means that $origtext will always be
undefined. Put 'use warnings;' in your code and perl will tell you that
you are printing an undefined value further below.
if ($tagname eq 'option') {
&getintro($attr->{ value });
}
if ($tagname eq 'h2') {
$in_heading = 1;
return;
}
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
That's indeed not quite right. You create too many <h2>...</h2> pairs
with that. For this HTML snippet:
<h2><i>Heading</i></h2>
your parser spits out (assuming that you remove one of the above two
undefs):
<h2><i></h2><h2>Heading</h2><h2></i></h2>
This is because you wrap _everything_ inside <h2></h2> when $in_heading
is true.
If you want to include the heading tags in your output, then you have to do the
following:
sub start {
my ($self, $tagname, undef, undef, $origtext) = @_;
$in_heading = 1 if $tagname eq 'h2';
print $origtext if $in_heading;
}
sub text {
my ($self, $origtext) = @_;
print $origtext if $in_heading;
}
sub end {
my ($self, $tagname, $origtext) = @_;
print $origtext if $in_heading;
$in_heading = 0 if $tagname eq 'h2';
}
If you don't want to include them, you have to return from the
start/end functions without writing anything when a <h2> tag is
encountered.
As I wrote before: It takes a little time to get used to the way
HTML:arser does the job. You have to be clear about how HTML:arser
triggers the callbacks and what arguments are passed. Use a fixed font
for the following:
<tag attr="val"><tag1>some text</tag1></tag>
`--------------'`----'`-------'`-----'`----'
(1) (2) (3) (4) (5)
(1) start ($self,
'tag', # $tagname
{ attr => 'val' }, # $attr
[ 'attr' ], # $attrseq
'<tag attr="val">' # $origtext
);
(2) start ($self,
'tag1', # $tagname
{ }, # $attr
[ ], # $attrseq
'<tag1>' # $origtext
);
(3) text ($self,
'some text', # $origtext
0 # $is_cdata
(4) end ($self,
'tag1', # $tagname
'</tag1>' # $origtext
);
(5) end ($self,
'tag', # $tagname
if ($tagname eq 'p') {
$p = 1;
return;
}
Unlike with the <h2> tags, here you do not print any <p> tags. So you
essentially just record when you are inside <p>, but you don't include
the <p> tags in your output.
Tassilo