K
Kloudnyne
I'm a newcomer to Perl, and am currently attempting to teach myself
Perl through using it, but I have currently come across an issue I
can't seem to see any way around.
I am trying to write a Perl script that will go through a series of
directories and their subdirectories, removing javascript, images,
bots, etc from HTML files in order to provide a text-reader friendly
version of each page. The actual conversion of any given file has been
taken care of, thanks to code heavily borrowed from an existing
script, but I can't seem to work out how I can get it to recurse
through the various subdirs.
The snippet of code I've thrown together for it so far is:
***
sub reading {
do {
opendir (CURRENTFOLDER, $htmdir) || die 'Ay Señor! Los bandidos have
raided that directory!'
while defined($filename = readdir(FOLDER)) = True {
$nesting = directorycheck(); #nesting tells me how deep we are into
subdirectories. a zero value is at the root of the process. only
really intended as a flag for testing
#dircheck checks to see if our victim this cycle is a
subdirectory. If it is, (I hope) we'll launch in to a nested subcycle.
}
closedir(CURRENTFOLDER); # with a little luck, re-opening the
previous folder will have the pointer still at the last position
checked, or else we have uber-recursives
chop ($htmdir); #prepping the string to ensure that the trailing char
is NOT a / (not that it should be anyway)
do {
}
until (chop($htmdir) ne '/');
# now that we've gone back to (and removed) the / nearest to the end
of the handle, we've effectively gone back to the parent directory
$nesting -- ;
}
until $htmdir = $htmroot;
}
sub directorycheck {
if (-d $filename) {
dircheck = $nesting + 1 ;
$htmdir = $filename .=$htmdir;
chdir ($htmdir);
} else {
$txtdir = $htmdir; # sets $txtdir to mirror $htmdir, but in the
/txt/ directory, where we want our output to be.
$txtdir =~s/htdocs/txt/; #(hopefully) changes the file path for
output to the /txt/ equivalent of the current /htdocs/ folder
parsetxt(); # only parses if we've hit a file, rather than a subdir.
}
}
***
where "parsetxt" is the subroutine that handles the actual conversion.
However, I can't even get this to compile, let alone run it to see if
it just dies or recurses away to infinity, or whatever.
The script is intended to run on a linux box acting as a webserver,
but for purposes of writing/testing I'm using ActivePerl 5.8 on a
win2k machine.
My question, after all this explanation, is this: Am I barking up the
wrong tree here, or am I just missing one little thing that will make
all this work? If anyone else has a piece of code that will fulfil my
requirements and make my life easier, you will have my undying
gratitude, because at this point I'm seriously starting to reconsider
scripting and just perform the conversions manually.
Thanks for your time.
PS: I apologise for the hideous formatting. It's actually quite
legible on a full-width screen, and I didn't want to disturb the text
for fear of accidentally altering the code.
Perl through using it, but I have currently come across an issue I
can't seem to see any way around.
I am trying to write a Perl script that will go through a series of
directories and their subdirectories, removing javascript, images,
bots, etc from HTML files in order to provide a text-reader friendly
version of each page. The actual conversion of any given file has been
taken care of, thanks to code heavily borrowed from an existing
script, but I can't seem to work out how I can get it to recurse
through the various subdirs.
The snippet of code I've thrown together for it so far is:
***
sub reading {
do {
opendir (CURRENTFOLDER, $htmdir) || die 'Ay Señor! Los bandidos have
raided that directory!'
while defined($filename = readdir(FOLDER)) = True {
$nesting = directorycheck(); #nesting tells me how deep we are into
subdirectories. a zero value is at the root of the process. only
really intended as a flag for testing
#dircheck checks to see if our victim this cycle is a
subdirectory. If it is, (I hope) we'll launch in to a nested subcycle.
}
closedir(CURRENTFOLDER); # with a little luck, re-opening the
previous folder will have the pointer still at the last position
checked, or else we have uber-recursives
chop ($htmdir); #prepping the string to ensure that the trailing char
is NOT a / (not that it should be anyway)
do {
}
until (chop($htmdir) ne '/');
# now that we've gone back to (and removed) the / nearest to the end
of the handle, we've effectively gone back to the parent directory
$nesting -- ;
}
until $htmdir = $htmroot;
}
sub directorycheck {
if (-d $filename) {
dircheck = $nesting + 1 ;
$htmdir = $filename .=$htmdir;
chdir ($htmdir);
} else {
$txtdir = $htmdir; # sets $txtdir to mirror $htmdir, but in the
/txt/ directory, where we want our output to be.
$txtdir =~s/htdocs/txt/; #(hopefully) changes the file path for
output to the /txt/ equivalent of the current /htdocs/ folder
parsetxt(); # only parses if we've hit a file, rather than a subdir.
}
}
***
where "parsetxt" is the subroutine that handles the actual conversion.
However, I can't even get this to compile, let alone run it to see if
it just dies or recurses away to infinity, or whatever.
The script is intended to run on a linux box acting as a webserver,
but for purposes of writing/testing I'm using ActivePerl 5.8 on a
win2k machine.
My question, after all this explanation, is this: Am I barking up the
wrong tree here, or am I just missing one little thing that will make
all this work? If anyone else has a piece of code that will fulfil my
requirements and make my life easier, you will have my undying
gratitude, because at this point I'm seriously starting to reconsider
scripting and just perform the conversions manually.
Thanks for your time.
PS: I apologise for the hideous formatting. It's actually quite
legible on a full-width screen, and I didn't want to disturb the text
for fear of accidentally altering the code.