slurp not working? ideas please!

G

Geoff Cox

Could also use "perltidy" which does a pretty good job.
thanks

$next3 is 'undef'ined.

What are the 4 lines of allphp2.php following the line with the
$pattern? Compare them to the values of $curr, $next1, $next2, and
$next3. Do they match?

Possibly, $pattern isn't found, and the "last" is never performed,
probably want a flag there.

my $found;
while (<INNN>) {
if (/$pattern/) { $found=1; last; }
}

return if !$found; # Or print some error message..
my ( $curr, $next1, $next2, $next3 ) = <INNN>;

will try this out. you will see from another of my posts that the code
in sub classroomnotes works when in a simplified script, on its own,
but not when in the full script. I have given both in that post. So,
it seems that the lines are there to be found! Very odd and no doubt a
simple explanation is out there!

Cheers

Geoff
 
G

Geoff Cox

Also sprach Anno Siegel:


Actually, the code doesn't define functions inside others. The indenting
merely suggests it does. :) The code is probably a bit better than it
looks on first sight (after all, it was in major parts written by me in
a previous thread;-).

Tassilo

I very much lijke the way you put that! I am reading perldoc perlstyle
and will make some effort there. Do you use any particular editor?

You can see from one of my posts here that the same code works in the
simplified script but not in the full script. Odd?!

Cheers

Geoff
 
G

Geoff Cox

On Tue, 20 Apr 2004 08:19:32 -0500, Tad McClellan

Tad,

I have used perltody programme so hoep this looks OK?

The code in the first section below works OK in that it does find the
links to the full docs. The same code in the second section does not
work - no value is found for $next3 in the sub classroomnotes....does
this improved layout make it easier for you to see why?

Cheers

Geoff


-------------------------
my $pattern = "docs/aslevel/classroom-notes/finance/finance";

open (INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php");
open (OUT, ">>d:/a-keep9/short-nondb/short/members2/test.htm");


while (<INNN>){
last if /$pattern/;
}
my ($curr, $next1, $next2, $next3) = <INNN>;
print ("$curr - $next1 - $next2 - $next3 \n");
close (INNN);

if ($next3 =~ /\$i\<(\d+);/) {

my $nn = $1;

print ("\$nn = $nn \n");

print OUT ("<td valign='top'> \n");
for ($c=1;$c<$nn;$c++) {
print OUT ('<a href="'. $pattern . "-doc" . $c . ".zip" . '">' .
"Document$c" . "</a><br>" . "\n");
}
print OUT ("</td></tr>\n");
}

------------------------


package MyParser;
use base qw(HTML::parser);
use strict;

my $in_heading;
my $p;

my $name = "as-left.htm";
#open (IN, "d:/a-keep9/short-nondb/oldshort2/$name") ||
#die "cannot open d:/a-keep9/short-nondb/oldshort2/$name \n";
open (OUT, ">>d:/a-keep9/short-nondb/short/members2/$name") || die
"cannot open >>d:/a-keep9/short-nondb/short/members2/$name: $! \n";


print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

sub start {

my ($self, $tagname, $attr, undef, $origtext) = @_;

if ($tagname eq 'h2') {
$in_heading = 1;
return;
}

if ($tagname eq 'p') {
$p = 1;
return;
}

if ($tagname eq 'option') {

choice($attr->{ value });

print ("\$attr etc = $attr->{ value } \n");

}

}

sub end {
my ($self, $tagname, $origtext) = @_;
if ($tagname eq 'h2') {
$in_heading = 0;
return;
}


if ($tagname eq 'p') {
$p = 0;
return;
}
}

sub text {
my ($self, $origtext) = @_;
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
print OUT ("<p>$origtext</p> \n") if $p;

}

sub choice {
my ($path) = @_;

if ($path =~ /docs\/aslevel\/classroom-notes/) {
intro($path);
aslevelclassroomnotes($path);
}

}

sub intro {

my ($pathhere) = @_;

open (INN, "d:/a-keep9/short-nondb/db/total-160404.txt") || die
"cannot open d:/a-keep9/short-nondb/db/total-160404.txt: $! \n";

my $lineintro;

while (defined ($lineintro = <INN>)) {
if ($lineintro =~ /$pathhere','(.*?)'\)\;/) {
print OUT ("<tr><td>$1 <p> </td>\n");
}
}
}



sub aslevelclassroomnotes {

my ($pattern) = @_;
my $c;
my $line;

#print ("\$pattern has value $pattern \n");

open (PHP, "d:/a-keep9/short-nondb/allphp/allphp2.php") || die "cannot
open d:/a-keep9/short-nondb/allphp/allphp2.php \n";

while (<PHP>){
# print " eof()=", eof() ? "true\n" : "false\n";
print ("\$pattern has value $pattern \n");
# $line = $_;
# print ("\$line = $_ \n");
last if /$pattern/;
}
my ($curr, $next1, $next2, $next3) = <PHP>;
print ("curr is $curr next1 is $next1 next2 is $next2 next3 is $next3
\n");
close (PHP);

if ($next3 =~ /\$i\<(\d+);/) {
my $nn = $1;
print OUT ("<td valign='top'> \n");
for ($c=1;$c<$nn;$c++) {
print OUT ('<a href="'. $pattern . "-doc" . $c . ".zip" . '">'
.. "Document$c" . "</a><br>" . "\n");
}
print OUT ("</td></tr>\n");
}
}


package main;
open (IN, "d:/a-keep9/short-nondb/oldshort2/$name") || die "cannot
open package main d:/a-keep9/short-nondb/oldshort2/$name: $! \n";
undef $/;
my $html = <IN>;
my $parser = MyParser->new;
$parser->parse($html);

open (OUT, ">>d:/a-keep9/short-nondb/short/members2/$name");
print OUT ("</tr></table> \n");
print OUT ("</body></html> \n");
 
G

Geoff Cox

The code in the first section below works OK in that it does find the
links to the full docs. The same code in the second section does not
work - no value is found for $next3 in the sub classroomnotes....does
this improved layout make it easier for you to see why?

Good grief ! I posted the wrong code layout - the following was
produced by perltidy ....

Geoff

-------------------------
my $pattern = "docs/aslevel/classroom-notes/finance/finance";

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );
open( OUT, ">>d:/a-keep9/short-nondb/short/members2/test.htm" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
print("$curr - $next1 - $next2 - $next3 \n");
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print("\$nn = $nn \n");

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern . "-doc" . $c . ".zip" .
'">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}

-----------------------------------------------------------

package MyParser;
use base qw(HTML::parser);

use strict;

my $in_heading;
my $p;

my $name = "as-left.htm";

#open (IN, "d:/a-keep9/short-nondb/oldshort2/$name") ||
#die "cannot open d:/a-keep9/short-nondb/oldshort2/$name \n";
open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
|| die "cannot open >>d:/a-keep9/short-nondb/short/members2/$name:
$! \n";

print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$p = 1;
return;
}

if ( $tagname eq 'option' ) {

choice( $attr->{value} );

print("\$attr etc = $attr->{ value } \n");

}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
print OUT ("<p>$origtext</p> \n") if $p;

}

sub choice {
my ($path) = @_;

if ( $path =~ /docs\/aslevel\/classroom-notes/ ) {
intro($path);
aslevelclassroomnotes($path);
}

}

sub intro {

my ($pathhere) = @_;

open( INN, "d:/a-keep9/short-nondb/db/total-160404.txt" )
|| die "cannot open d:/a-keep9/short-nondb/db/total-160404.txt:
$! \n";

my $lineintro;

while ( defined( $lineintro = <INN> ) ) {
if ( $lineintro =~ /$pathhere','(.*?)'\)\;/ ) {
print OUT ("<tr><td>$1 <p> </td>\n");
}
}
}

sub aslevelclassroomnotes {

my ($pattern) = @_;
my $c;
my $line;

#print ("\$pattern has value $pattern \n");

open( PHP, "d:/a-keep9/short-nondb/allphp/allphp2.php" )
|| die "cannot open d:/a-keep9/short-nondb/allphp/allphp2.php
\n";

while (<PHP>) {

# print " eof()=", eof() ? "true\n" : "false\n";
print("\$pattern has value $pattern \n");

# $line = $_;
# print ("\$line = $_ \n");
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <PHP>;
print("curr is $curr next1 is $next1 next2 is $next2 next3 is
$next3 \n");
close(PHP);

if ( $next3 =~ /\$i\<(\d+);/ ) {
my $nn = $1;
print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern . "-doc" . $c .
".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

package main;
open( IN, "d:/a-keep9/short-nondb/oldshort2/$name" )
|| die "cannot open package main
d:/a-keep9/short-nondb/oldshort2/$name: $! \n";
undef $/;
my $html = <IN>;
my $parser = MyParser->new;
$parser->parse($html);

open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" );
print OUT ("</tr></table> \n");
print OUT ("</body></html> \n");
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
Tassilo

I very much lijke the way you put that! I am reading perldoc perlstyle
and will make some effort there. Do you use any particular editor?

Well, yes, I do. But the editor used is a weak excuse for formatting
code poorly. If you cannot be bothered to do the indenting yourself, get
an editor that does it for you automatically. I use vim that does most
of the indenting for me. Others (like emacs) can do it as well.

Having said that, it took me one key-stroke to re-indent your code and
immediately realize that Anno was in fact right. There are nested
subroutine definitions. I didn't see them in your raw posting because
you managed to hide them by not indenting the relevant part.

I now see things that I didn't see before and honestly I find them
frightening:

package MyParser;
use base qw(HTML::parser);
use File::Find;

my $in_heading;
my $p;

my $dir = ("d:/a-keep9/short-nondb/oldshort2");

find sub {
my $name = $_;
open (OUT, ">>d:/a-keep9/short-nondb/short/members2/$name");
print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

sub start { ... }
sub end { ... }
...

package main;
my $parser = MyParser->new;
$parser->parse($html);
}, $dir;

Think about this structure for a while. If you are able to explain why
you define the HTML::parser callbacks inside the File::Find::find()
subroutine (which is, after all, triggered for each found file) then,
and only then, you will get my blessing for the above.

There are other dreadful things to see there, notably the switching to
package main inside the function reference.

The whole thing got a bit out of your hands, I am afraid. You have an
iterative task (namely parsing a bunch of files). Furthermore you have a
class (your HTML parser). A class is an abstract description...you only
define it once. Later you may create as many objects as you want that
all conform to the description given by the class.

But the class itself only exists once. By its nature, it's the opposite
of iterative. That means the first thing to do is move the whole
File::Find stuff out of MyParser. It's just wrong there.

So the outline of your script should look like this:

package MyParser;
use base qw/HTML::parser/;

# global variables
my ($in_heading, $in_p);
...

# handlers
sub reset { ($in_heading, $in_p) = (0, 0) }
sub start { ... }
sub end { ... }
...

package main;

use File::Find;

my $dir = "....";
my $parser = MyParser->new;

find sub {
$parser->parse_file($_);
$parser->reset;
} => $dir;

Ideally, your parser class doesn't even know what kind of task it is
used for. All it does is providing the infrastructure and facilities for
parsing HTML. That's it. Whether you pass one file or thousands of
files...that's not your parser class' business at all.

The iteration of the files is done with the actual object. You create
one parser. HTML::parser is one of those cases where the object can be
reused; in other scenarios and classes you would create one object per
"problem instance" (in your case the "problem instance" is the file you
want to parse).

So you have this one parser and iteratively have it parse one file after
the other. That's what the above skeleton does. In MyParser the abstract
concept of parsing is defined once. In the main package you apply this
abstract concept to many files. This happens in these four lines:

find sub {
$parser->parse_file($_);
$parser->reset;
} => $dir;

I am quite sure your program will magically start working once you
change it accordingly to what I've written.

Tassilo
 
T

Tad McClellan

Please don't say I wrote stuff and then supply nothing that I wrote!


[ snip 180 lines that Tad *did not* write ]
 
T

Tad McClellan

I have used perltody programme so hoep this looks OK?


Is there anybody in there?

Just nod if you can hear me.

open (OUT, ">>d:/a-keep9/short-nondb/short/members2/test.htm");


<broken-record>
You should always, yes *always*, check the return value from open().

This has been pointed out to you before.

Do you actually read the followups to your posts?
</broken-record>



Once you know that you should check the return value from open()
we kinda expect that you _will_ check the return value from open().

I get the feeling I am "talking to the hand".

I'm giving up on you.

my ($curr, $next1, $next2, $next3) = <INNN>;
print ("$curr - $next1 - $next2 - $next3 \n");


I doubt that perltidy indents like that...

for ($c=1;$c<$nn;$c++) {


foreach my $c ( 1 .. $nn ) {
 
G

Geoff Cox

I am quite sure your program will magically start working once you
change it accordingly to what I've written.

Tassilo,

Not quite there yet - although a million thanks for the care you have
taken to explain how wrong I was. The light is beginning to dawn!

The code below brings up 2 warnings

1. use of inherited AUTOLOAD for non-method MyParser::choice() is
deprecated at line 26 - ie

choice( $attr->{ value } );

2. can't locate auto /yParser/choice.al in @INC (@INC contains:
D:/perl/lib D:/perl/site/lib .) at line 24 - ie

if ( $tagname eq 'option' ) {

As warning 1 says "deprecated" I am guessing that this is not stopping
the script? How about warning 2?

Also, how do I deal with the print OUT in sub text?

sub text {
my ( $self, $origtext ) = @_;
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
print OUT ("<p>$origtext</p> \n") if $in_p;

At this point the file has not been opened ...?


Cheers

Geoff


package MyParser;
use base qw(HTML::parser);
use strict;
use diagnostics;

my ($in_heading,$in_p);

sub reset { ($in_heading,$in_p)=(0,0)}

sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 1;
return;
}

if ( $tagname eq 'option' ) {

choice( $attr->{ value } );

}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
print OUT ("<p>$origtext</p> \n") if $in_p;

}

package main;

use File::Find;

my $dir = "d:/a-keep9/short-nondb/oldshort2";
my $parser = MyParser->new;

find sub {

my $name = $_;

open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" );
|| die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

undef $/;

$parser->parse_file($_);
$parser->reset;


}, $dir;

sub choice {
my ($path) = @_;
if ( $path =~ /docs\/btec-first/ ) {
intro($path);
btecfirst($path);
}
elsif ( $path =~ /docs\/aslevel\/classroom-notes/ ) {
intro($path);
aslevelclassroomnotes($path);
}
elsif ( $path =~ /docs\/avce\/assignments/ ) {
intro($path);
avceassignments($path);
}
elsif ( $path =~
/docs\/aslevel\/simulations\/second-severn-bridge/ ) {
intro($path);
aslevelsimulationssevern($path);
}
elsif ( $path =~ /docs\/aslevel\/debates\/wind-farm-debate/ ) {
intro($path);
asleveldebateswindfarm($path);
}
elsif ( $path =~ /docs\/economics\/section1/ ) {
intro($path);
economicssection1($path);
}
elsif ( $path =~ /docs\/economics\/section2/ ) {
intro($path);
economicssection2($path);
}
elsif ( $path =~ /docs\/economics\/section3/ ) {
intro($path);
economicssection3($path);
}
elsif ( $path =~ /docs\/gcse\/classroom-notes/ ) {
intro($path);
gcseclassroomnotes($path);
}
elsif ( $path =~
/docs\/gcse\/student-activities\/games\/ice-lolly/ ) {
intro($path);
gcsegamesicelolly($path);
}
elsif ( ( $path =~ /docs\/gnvq-int\/assignments/ )
&& ( $path !~
/gnvq-int\/assignments\/gnvq-int-write-assignment/ ) )
{
intro($path);
gnvqintassignments($path);
}
elsif ( $path =~ /docs\/vgcse\/course-units/ ) {
intro($path);
vgcsecourseunits($path);
}
elsif ( $path =~
/docs\/gcse\/student-activities\/business-location/ ) {
intro($path);
gcsestudentactivitiesbusinesslocation($path);
}
elsif ( $path =~

/docs\/gcse\/student-activities\/business-structure-decisions/ )
{
intro($path);
gcsestudentactivitiesbusinessstructuredecisions($path);
}
elsif ( $path =~ /docs\/gcse\/student-activities\/finance/ ) {
intro($path);
gcsestudentactivitiesfinance($path);
}
elsif ( $path =~ /docs\/gcse\/student-activities\/marketing/ ) {
intro($path);
gcsestudentactivitiesmarketing($path);
}
elsif ( $path =~ /docs\/gcse\/student-activities\/people-at-work/
) {
intro($path);
gcsestudentactivitiespeopleatwork($path);
}
elsif ( $path =~ /docs\/gcse\/student-activities\/production/ ) {
intro($path);
gcsestudentactivitiesproduction($path);
}
else {
intro($path);
other($path);
}

}

sub intro {

my ($pathhere) = @_;
open( INN, "d:/a-keep9/short-nondb/db/total-160404.txt" );
my $lineintro;

while ( defined( $lineintro = <INN> ) ) {
if ( $lineintro =~ /$pathhere','(.*?)'\)\;/ ) {
print OUT ("<tr><td>$1 <p> </td>\n");
}
}
}

sub btecfirst {

my ($pattern) = @_;
my $linee = $pattern;
my $c = 0;
$linee =~ /.*unit(\d).*?chap(\d)/;
my $u = $1;
my $chap = $2;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d);/ ) {
my $nn = $1;
print OUT ("<td valign='top'>\n");
for ( my $c = 1 ; $c < $nn ; $c++ ) {
print OUT (
'<a href="' . $pattern . "/unit" . $u . "-chap" .
$chap . "-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub aslevelclassroomnotes {

my ($pattern) = @_;
my $c;

print("\$pattern = $pattern \n");

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

my $line = <INNN>;

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
print("$curr - $next1 - $next2 - $next3 \n");
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print("\$nn = $nn \n");

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern . "-doc" . $c .
".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub other {

my ($pattern) = @_;
print OUT ("<td valign='top'> \n");
print OUT ( '<a href="' . $pattern . ".zip" . '">'
. "Document"
. "</a><br>"
. "\n" );
print OUT ("</td></tr>\n");

}

sub avceassignments {

my ($pattern) = @_;
print OUT ("<td valign='top'> \n");
print OUT ( '<a href="' . $pattern . ".zip" . '">'
. "Document"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern . "-grid" . ".zip" . '">' .
"Grid"
. "</a><br>"
. "\n" );
print OUT ("</td></tr>\n");

}

sub aslevelsimulationssevern {

my ($pattern) = @_;
print OUT ("<td valign='top'> \n");
print OUT ( '<a href="' . $pattern . ".zip" . '">'
. "Teacher's pack"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern . "-sp" . ".zip" . '">'
. "Student's Pack"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern . "-swb" . ".zip" . '">'
. "Student's Workbook"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern . "-dbp" . ".zip" . '">'
. "Debriefing Pack"
. "</a><br>"
. "\n" );

print OUT ("</td></tr>\n");

}

sub asleveldebateswindfarm {

my ($pattern) = @_;
print OUT ("<td valign='top'> \n");
print OUT ( '<a href="' . $pattern . ".zip" . '">'
. "Teacher's pack"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-a-company" . ".zip" . '">'
. "Adviser's Pack - Company"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-a-council" . ".zip" . '">'
. "Adviser's Pack - Council"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-s-company" . ".zip" . '">'
. "Student's Pack - Company"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-s-council" . ".zip" . '">'
. "Student's Pack - Council"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-s-workbook" . ".zip" . '">'
. "Student's Workbook"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-arbitrator" . ".zip" . '">'
. "Arbitrator's Pack"
. "</a><br>"
. "\n" );

print OUT ("</td></tr>\n");

}

sub economicssection1 {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/section1-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub economicssection2 {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/section2-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub economicssection3 {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/section3-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcseclassroomnotes {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern . "-doc" . $c .
".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcsegamesicelolly {

my ($pattern) = @_;
print OUT ("<td valign='top'> \n");
print OUT ( '<a href="' . $pattern . ".zip" . '">'
. "Teacher's Pack"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-runners" . ".zip" . '">'
. "Runner's Pack"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern
. "-adjudicators" . ".zip" . '">'
. "Adjudicator's Pack"
. "</a><br>"
. "\n" );

print OUT ("</td></tr>\n");

}

sub gnvqintassignments {

my ($pattern) = @_;
print OUT ("<td valign='top'> \n");
print OUT ( '<a href="' . $pattern . ".zip" . '">'
. "Document"
. "</a><br>"
. "\n" );
print OUT ( '<a href="' . $pattern . "-grid" . ".zip" . '">' .
"Grid"
. "</a><br>"
. "\n" );
print OUT ("</td></tr>\n");

}

sub vgcsecourseunits {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern . "-chap" . $c .
".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub para {
my ($name) = @_;
my $line;
open( INPARA, "d:/a-keep9/short-nondb/progs/para2/$name" );

undef $/;
$line = <INPARA>;

print OUT ("<tr><td colspan='2'> \n");
print OUT $line;
print OUT ("</td></tr> \n");

$/ = "\n";
}

sub getpara {
local $/ = "\0d\0a";

my ($name) = @_;
my $line;

open( GETPARA, "d:/a-keep9/short-nondb/old-short/$name" );
open( OUTPARA, ">>d:/a-keep9/short-nondb/progs/para2/$name" );

while ( defined( $line = <GETPARA> ) ) {
if ( $line =~ /<p>(.*?)<\/p>/s ) {
print OUTPARA ("$1 \n");
}
}

#close(GETPARA);
#close(OUTPARA);
}

sub gcsestudentactivitiesbusinesslocation {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/location-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcsestudentactivitiesbusinessstructuredecisions {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/bsd-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcsestudentactivitiesfinance {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/finance-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcsestudentactivitiesmarketing {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/marketing-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcsestudentactivitiespeopleatwork {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/people-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

sub gcsestudentactivitiesproduction {

my ($pattern) = @_;

my $c = 0;

open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d+);/ ) {

my $nn = $1;

print OUT ("<td valign='top'> \n");
for ( $c = 1 ; $c < $nn ; $c++ ) {
print OUT ( '<a href="' . $pattern
. "/production-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}
 
G

Geoff Cox

I doubt that perltidy indents like that...

Tad,

Correct - I did post the wrong layout - apologies. I did correct this
with a second post.

My code was badly wrong re the OOP. Tassilo has really helped by
making clear how this works. My reply to his posting shows my "best"
so far.

Do hope you can hang on just a little longer as I am starting to get
the hang of this !

Re the checking with die etc. I am using this but it seems that my
code is wrong in that it does not get a file to parse the first time
round but does after that...so to be able to continue I either # the
line or remove it....bad I know but only able to attempt to solve one
problem at a time..

Thanks

Geoff
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
Tassilo,

Not quite there yet - although a million thanks for the care you have
taken to explain how wrong I was. The light is beginning to dawn!

The code below brings up 2 warnings

1. use of inherited AUTOLOAD for non-method MyParser::choice() is
deprecated at line 26 - ie

choice( $attr->{ value } );

You call this inside the package MyParser, but choice() is defined in
package main. You have two options: Either calling it package-qualified:

main::choice( $attr->{ value } );

or you move the definition of choice() into MyParser. If you do the
latter but also intend to call choice() somewhere in main (I didn't
check), then you'll again have to package-qualify this call:

MyParser::choice( ... );
2. can't locate auto /yParser/choice.al in @INC (@INC contains:
D:/perl/lib D:/perl/site/lib .) at line 24 - ie

if ( $tagname eq 'option' ) {

As warning 1 says "deprecated" I am guessing that this is not stopping
the script? How about warning 2?

Warning 2 is in fact a fatal error and so it does terminate your script.
However, it's a follow-up to warning 1. If you fix the call to choice()
by either moving it into MyParser or package-qualifying calls to it,
this error will go away, too.
Also, how do I deal with the print OUT in sub text?

sub text {
my ( $self, $origtext ) = @_;
print OUT ("<h2>$origtext</h2> \n") if $in_heading;
print OUT ("<p>$origtext</p> \n") if $in_p;

At this point the file has not been opened ...?

Ah, ok. That means your class needs some additional static data
($in_heading and $in_p are already static information, btw).

Add a third variable and a method to introduce the filehandle to
MyPackage thusly:

my ($in_heading, $in_p, $fh);
...
sub register_fh {
$fh = shift;
}

# the callbacks now need to use $fh instead of OUT
sub text {
my ($self, $origtext) = @_;
print $fh "<h2>$origtext</h2>\n" if $in_heading;
print $fh "<p>$origtext</p>\n" if $in_p;
}

This creates a door to MyParser through which you can send the
filehandle to which you want the output to go. You use it inside the
File::Find::find() callback. Like this:

find sub {

my $name = $_;

open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" ) ||
die "can't open d:/a-keep9/short-nondb/short/members2/$name: $!";

print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

$parser->register_fh(\*OUT); # filehandles are best passed by reference
$parser->parse_file($_);
$parser->reset;

}, $dir;

A note on filehandles: A filehandle (a GLOB in Perl) is the only
datatype that can be replaced by a reference where you use the reference
in place of the ordinary GLOB:

open OUT, "file" or die $!;
print OUT "foobar\n";

is very much the same as

open OUT, "file" or die $!;
my $fh = \*OUT;
print $fh "foobar\n";

or even this (works with perl5.6.0 and above):

open my $fh, "file" or die $!;
print $fh "foobar\n";

The latter is preferable because a lexical filehandle is automatically
closed when it goes out of scope.

Finally a few notes regarding your code:
package MyParser;
use base qw(HTML::parser);
use strict;
use diagnostics;
[...]

package main;

use File::Find;

my $dir = "d:/a-keep9/short-nondb/oldshort2";
my $parser = MyParser->new;

find sub {

my $name = $_;

open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" );
|| die "can't open d:/a-keep9/short-nondb/short/members2/$name: $!";

The semicolon terminating the open() statement should not be there. It's
even a syntax-error.
print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

undef $/;

Now that you use parse_file(), you no longer slurp anything and
therefore this line can be removed.
$parser->parse_file($_);
$parser->reset;


}, $dir;

sub choice {
my ($path) = @_;
if ( $path =~ /docs\/btec-first/ ) {
intro($path);
btecfirst($path);
}
elsif ( $path =~ /docs\/aslevel\/classroom-notes/ ) {
intro($path);
aslevelclassroomnotes($path);
}
elsif ( $path =~ /docs\/avce\/assignments/ ) {
intro($path);
avceassignments($path);
}
[...]

else {
intro($path);
other($path);
}

}

Repetitive code like that cries for shortcut. Don't do the matches
against $path (it's too much to type) but use Perl's default variable $_
here. Two ways to do that:

local $_ = $path;
if ( /docs\/btec-first/ ) {
...
}
elsif ( /docs\/aslevel\/classroom-notes/) {
...
}
...

Or wrap the whole stuff in a for-loop:

for ($path) {
if ( /docs\/btec-first/ ) {
...
}
...
else {
...
}
}

Furthermore, there are too many braces. You don't need any of them really:

local $_ = $path;

/docs\/btec-first/ and
intro($path), btecfirst($path), return;

/docs\/aslevel\/classroom-notes/ and
intro($path), ..., return;

...

The point is that for monotonous code like that, Perl offers many ways
to rewrite it. Use the most concise and most readable one. The above
suggested one has many advantages. The most important one is visual
distinctiveness: you have the condition in one line and the action to
take in case the condition was true in the line below indended. It's
easy to add new condition/action pairs and you don't need to distinguish
any longer between 'if', 'elsif' and 'else'. The order of the conditions
plus the 'return' in their action take care of that implicitely.

Apart from that, it's even a tiny bit more performant as you don't have
any blocks any longer. Perl has to pay a small price when entering a
block. Take that as an additional bonus that comes for free. Readability
should be your first concern when choosing how to write up a thing.
sub intro {

my ($pathhere) = @_;
open( INN, "d:/a-keep9/short-nondb/db/total-160404.txt" );
my $lineintro;

while ( defined( $lineintro = <INN> ) ) {
if ( $lineintro =~ /$pathhere','(.*?)'\)\;/ ) {
print OUT ("<tr><td>$1 <p> </td>\n");
}
}
}

sub btecfirst {

my ($pattern) = @_;
my $linee = $pattern;
my $c = 0;
$linee =~ /.*unit(\d).*?chap(\d)/;
my $u = $1;
my $chap = $2;

Can you be sure that the match against $linee will always be succesfull?
Only in this case you are allowed to use $1 and $2. Otherwise they will
contain garbage from a previous match. Best is to get rid of $1 and $2
altogether:

my ($u, $chap) = $line =~ /.*unit(\d).*?chap(\d)/;

A match in list context returns the captured submatches. It returns the
empty list when the match failed.
open( INNN, "d:/a-keep9/short-nondb/allphp/allphp2.php" );

while (<INNN>) {
last if /$pattern/;
}
my ( $curr, $next1, $next2, $next3 ) = <INNN>;
close(INNN);

if ( $next3 =~ /\$i\<(\d);/ ) {
my $nn = $1;
print OUT ("<td valign='top'>\n");
for ( my $c = 1 ; $c < $nn ; $c++ ) {
print OUT (
'<a href="' . $pattern . "/unit" . $u . "-chap" .
$chap . "-doc"
. $c . ".zip" . '">'
. "Document$c"
. "</a><br>"
. "\n" );
}
print OUT ("</td></tr>\n");
}
}

For-loops as in C are almost always frowned upon. Easier to read:

for my $c (1 .. $nn - 1) {

I am not sure about the string concatenation either. You could do it
implicitely (and also choose a different quoting mechanism because of
the literal quotes that shall appear in the string):

print OUT qq[<a href="$pattern/unit$u-chap$chap-doc$c.zip">];
print OUT qq[Document$c</a><br>\n];

This can be applied to all other functions below.

Also:
sub aslevelclassroomnotes {

sub aslevelsimulationssevern {

sub asleveldebateswindfarm {

sub gcsestudentactivitiesbusinesslocation {

sub gcsestudentactivitiesbusinessstructuredecisions {

sub gcsestudentactivitiesfinance {

sub gcsestudentactivitiesmarketing {

sub gcsestudentactivitiespeopleatwork {

sub gcsestudentactivitiesproduction {

Isupposeyoureallylikelongfunctionnameswithnounderscoresinthem, eh? ;-)

In short: Those function names are too long. If you have long names, at
least separate its components with underscores:

sub as_level_classroom_notes {

This is still a bit long, but it's more friendly to the eye.

My experience with names is that a certain level of verbosity is
alright, but making them too verbose and chatty is going to go on your
nerves sooner or later.

Tassilo
 
G

Geoff Cox

On 21 Apr 2004 06:11:42 GMT, "Tassilo v. Parseval"

Tassilo,

I am working through all your latest comments but have the error
message "not a GLOB reference at line 53", ie on following line,

print $fh "<h2>$origtext</h2> \n" if $in_heading;

in sub text.

Also, the first attempt to open $name is failing,

find sub {
my $name = $_;
open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
|| die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

so in order to move on I have to either # out the die line or use

unless ($name eq ".") {open(OUT etc ...

Just a bit of lateral thinking when I found that print $name gave ".".
I know this is ugly and ignores the cause .... which I am unable to
see at the moment..

Geoff


package MyParser;
use base qw(HTML::parser);
use strict;
use diagnostics;

my ($in_heading,$in_p, $fh);

sub register_fh {

$fh = shift;

}

sub reset { ($in_heading,$in_p)=(0,0)}

sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 1;
return;
}

if ( $tagname eq 'option' ) {

main::choice( $attr->{ value } );

}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print $fh "<h2>$origtext</h2> \n" if $in_heading;
print $fh "<p>$origtext</p> \n" if $in_p;

}

package main;

use File::Find;

my $dir = "d:/a-keep9/short-nondb/oldshort2";
my $parser = MyParser->new;

find sub {

my $name = $_;
open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
|| die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

print OUT ("<html><head><title>test</title></head><body> \n");
print OUT ("<table width='100%' border='1'> \n");
}




# undef $/;

$parser->register_fh(\*OUT);
$parser->parse_file($_);
$parser->reset;


}, $dir;

etc etc
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
On 21 Apr 2004 06:11:42 GMT, "Tassilo v. Parseval"

Tassilo,

I am working through all your latest comments but have the error
message "not a GLOB reference at line 53", ie on following line,

print $fh "<h2>$origtext</h2> \n" if $in_heading;

in sub text.

Sorry, my mistake. register_fh() must read:

sub register_fh {
# $_[0] contains the parser object
$fh = $_[1];
}
Also, the first attempt to open $name is failing,

find sub {
my $name = $_;
open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
|| die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

so in order to move on I have to either # out the die line or use

unless ($name eq ".") {open(OUT etc ...

Just a bit of lateral thinking when I found that print $name gave ".".
I know this is ugly and ignores the cause .... which I am unable to
see at the moment..

The reason is that File::File::find() also returns directory names, most
notably "." and ".." as well. Have it return early when the current file
is not one that you want to process:

find sub {
return if /^\.\.?/; # catches "." and ".."
my $name = $_;
...
}, $dir;

Tassilo
 
G

Geoff Cox

The reason is that File::File::find() also returns directory names, most

Tassilo,

Ah! I should have thought of that ...

Anyway - one BIG HURRAY !!!! The code now runs with -w and not a
single error or warning! Many thanks for your help - would not have
got so far without it.

Will now sort out the choice sub etc.

Have a good day!

Cheers

Geoff
 
G

Geoff Cox

On 21 Apr 2004 09:26:17 GMT, "Tassilo v. Parseval"

Tassilo

just one more question!?

In one of the html files I have for example

<h2> jkaskdjkla </h2>

<option values etc

<p> hsajdj ka </p>

The code is giving the data in a different order, ie

<h2> jkaskdjkla </h2>

<p> hsajdj ka </p>

<option values etc

Am I able to change the order? Cheers Geoff

----------------------------------------

sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 1;
return;
}

if ( $tagname eq 'option' ) {

main::choice( $attr->{ value } );

}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print $fh "<h2>$origtext</h2> \n" if $in_heading;
print $fh "<p>$origtext</p> \n" if $in_p;

}
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
On 21 Apr 2004 09:26:17 GMT, "Tassilo v. Parseval"

Tassilo

just one more question!?

In one of the html files I have for example

<h2> jkaskdjkla </h2>

<option values etc

<p> hsajdj ka </p>

The code is giving the data in a different order, ie

<h2> jkaskdjkla </h2>

<p> hsajdj ka </p>

<option values etc

Where is this last line produced? In the snippet you provided, you only
print text wrapped in <h2> and <p> tags so I can't see where the last
line would come from.

Tassilo
 
G

Geoff Cox

Also sprach Geoff Cox:


Where is this last line produced? In the snippet you provided, you only
print text wrapped in <h2> and <p> tags so I can't see where the last
line would come from.

Tassilo,

The option part comes from the

if ( $tagname eq 'option' ) {

main::choice( $attr->{ value } );

}

The choice sub selects other subs relevant to the path for each
<option etc. The <h2> etc is the heading for each section, followed by
<p> description and then a set of options with associated paths to
full documents...hope this is OK?

Cheers

Geoff

sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 1;
return;
}

if ( $tagname eq 'option' ) {

main::choice( $attr->{ value } );

}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print $fh "<h2>$origtext</h2> \n" if $in_heading;
print $fh "<p>$origtext</p> \n" if $in_p;

}
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
Tassilo,

The option part comes from the

if ( $tagname eq 'option' ) {

main::choice( $attr->{ value } );

}

The choice sub selects other subs relevant to the path for each
<option etc. The <h2> etc is the heading for each section, followed by
<p> description and then a set of options with associated paths to
full documents...hope this is OK?

Hmmh. When I test it with the following modification:
sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 1;
return;
}

if ( $tagname eq 'option' ) {

print "$origtext\n";
}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print $fh "<h2>$origtext</h2> \n" if $in_heading;
print $fh "<p>$origtext</p> \n" if $in_p;

}

and run it on

<h2> jkaskdjkla </h2>
<p> hsajdj ka </p>
<option values="bla">

the output is the same as the input. Likewise, when I swap <h2> and <p>.
I don't see where this parser would change anything about the order of
the tags. It should preserve it.

I remember that main::choice() was this huge 'if/elsif/else' orgy. Can
you take out all but one branch and also post a few lines of HTML input
that exhibits this behaviour? I can't say much more right now because I
cannot reproduce this problem.

Tassilo
 
G

Geoff Cox

and run it on

<h2> jkaskdjkla </h2>
<p> hsajdj ka </p>
<option values="bla">

the output is the same as the input. Likewise, when I swap <h2> and <p>.
I don't see where this parser would change anything about the order of
the tags. It should preserve it.

Tassilo,

The change of order appears if you have a second set of data.


<h2> first - jkaskdjkla </h2>
<p> first - hsajdj ka </p>
<option values="first">

<h2> second - jkaskdjkla </h2>
<p> second - hsajdj ka </p>
<option values="second">

If above results similar to mine, the the 2 <h2> are in the correct
order but are then followed by the 2 <p> sections, then the 2 <option
sections, rather than the order above...?

Cheers

Geoff

package MyParser;
use base qw(HTML::parser);
use strict;
use diagnostics;

my ($in_heading,$in_p, $fh);

sub register_fh {
# $_[0] contains the parser object
$fh = $_[1];
}

sub reset { ($in_heading,$in_p)=(0,0)}

sub start {

my ( $self, $tagname, $attr, undef, $origtext ) = @_;

if ( $tagname eq 'h2' ) {
$in_heading = 1;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 1;
return;
}

if ( $tagname eq 'option' ) {

print ("\$origtext has value $origtext \n");

main::choice( $attr->{ value } );

}

}

sub end {
my ( $self, $tagname, $origtext ) = @_;
if ( $tagname eq 'h2' ) {
$in_heading = 0;
return;
}

if ( $tagname eq 'p' ) {
$in_p = 0;
return;
}
}

sub text {
my ( $self, $origtext ) = @_;
print $fh "<h2>$origtext</h2> \n" if $in_heading;
print $fh "<p>$origtext</p> \n" if $in_p;

}

package main;

use File::Find;

my $dir = "d:/a-keep9/short-nondb/oldshort2";
my $parser = MyParser->new;

find sub {
return if /^\.\.?/; # catches "." and ".."
my $name = $_;
open( OUT, ">>d:/a-keep9/short-nondb/short/members2/$name" )
|| die "can't open d:/a-keep9/short-nondb/short/members2/$name:
$!";

print OUT ("<html><head><title>test</title>
<link rel=\"STYLESHEET\" type=\"text/css\"
href=\"assets/style/hala-1.css\">
</head><body> \n");
print OUT ("<table width='100%' border='1'> \n");

# undef $/;

$parser->register_fh(\*OUT);
$parser->parse_file($_);
$parser->reset;

print OUT ("</body></html> \n");

}, $dir;

sub choice {
my ($path) = @_;
if ( $path =~ /docs\/btec-first/ ) {
intro($path);
btecfirst($path);
}
 
T

Tassilo v. Parseval

Also sprach Geoff Cox:
Tassilo,

The change of order appears if you have a second set of data.


<h2> first - jkaskdjkla </h2>
<p> first - hsajdj ka </p>
<option values="first">

<h2> second - jkaskdjkla </h2>
<p> second - hsajdj ka </p>
<option values="second">

If above results similar to mine, the the 2 <h2> are in the correct
order but are then followed by the 2 <p> sections, then the 2 <option
sections, rather than the order above...?

Which order above? The order above is <h2>, <p> and finally <option>.

Even if I add a second or third set of data, the order remains intact
for me.

I still cannot reproduce this. :)

Tassilo
 
R

Richard Morse

Tassilo v. Parseval said:
The reason is that File::File::find() also returns directory names, most
notably "." and ".." as well. Have it return early when the current file
is not one that you want to process:

find sub {
return if /^\.\.?/; # catches "." and ".."
my $name = $_;
...
}, $dir;

I'm not sure exactly how you would get the directory name here (I've not
had the pleasure of using File::Find yet), but wouldn't it be better to
do something like:

find sub {
return if -d (some_function_to_get_cwd() . $_);
...
}

? This would handle any other miscellaneous directories that appear...

Ricky
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top