how to recursively search through files (not dirs)?

D

David Combs

What I want to do traverse files that recursively
"include" or (as in csh .cshrc-files) "source"
other files -- then come back and continue
to process the "sourcing" file.

Files like these:

375 ====> head -5000 test{1,2,3}.source
==> test1.source <==
foo
source test2.source
bar
bletch

==> test2.source <==
this is test2.source, line ONE
source test3.source
this is test2.source, line THREE
this is test2.source, line FOUR

==> test3.source <==
this is test3.source, line ONE.
this is test3.source, line TWO.
376 ====> !! >! foo.out
head -5000 test{1,2,3}.source > ! foo.out
377 ====>

I have tried lots of (probably wrong) things in trying
to have the file-handle being lexically-bound,
so it would reappear in the same state upon
(recursive) return, and the while would keep
reading the lines *after* the "source" line up to
the end of file.

No such luck. You'll see three triples of statements,
two of the triples commented out, and I can't
get any of them to work -- heck, not even to compile!

(Heck, I also have problems getting cperl-mode (emacs)
bounce-parens correctly, eqivalently it complains
when I regionsize the sub, and try cperl-indent-region,
for which cperl-mode gives me:

Scan error: Unbalanced parentheses" 3059, 5237
( giving different locations depending on what triple is uncommented)
, and I have no luck finding anything wrong (unbalanced),
or missing trailing-semicolons, or missing double-quotes,
yada, yada, yada.


FYI: I've been using perl-5.8.6.

(I originally had it at least opening the files in
the right order, but not restoring the prior
filehandle on returning -- but that version
I didn't save away, and it's now transmuted
into this mess:

(Crazy, but I can find no example in any of my many
perl books, nor in any of the (huge amount of) stuff
(threads) I've downloaded from here. The only mention
of "recursive" in the camel-3 is about doing it through
*directories* -- and they (well, he) cheats by using
file::find, instead of doing it "by hand" (to show
how lexical handles would do the right thing on the dfs).)

Thanks much for any help!

(syntax, semantics, as well as the cperl-mode?)


----------------- here it is (ugh!):


#!/me-FIRST-in-PATH-bin/perl-5.8.6/bin/perl5.8.6 -w traverse-via-SOURCE-stmts.pl
use strict;
use diagnostics;
use warnings;
use Carp;
use IO::Handle; # test of 11may05: (recursive opens)
use FileHandle; # ---- from pg 895 in camel 3rd:

# ------------------------------------------------------------------------
# ------------------------------------------------ processOneFile():::
# ------------------------------------------------------------------------

# cperl-indent-region on the entire sub, complains (unfortunately
# the err-msg doesn't make it to *Messages*):
# Scan error: "Unbalanced parentheses", 3002, 5180"


sub processOneFile {
my $fileName = shift;
my $level = shift;

print("\n========================= ENTERING processOneFile(\"$fileName\", $level\n");

# ---- from pg 895 in camel 3rd, (with "my" added):

my $fh = new FileHandle;
die "Cannot open \"$fileName\"!" if (! fh->open( $fileName ));
while (defined(my $line = <$fh>)) { # while:

# my MYFIN; # or is it "my *MYFIN" ... or what?
# open(MYFIN, "<" . $fileName) or die "Cannot open!";
# while (defined(my $line = <MYFIN>)) { # while:

# my $fh = new IO::Handle $fileName, "r";
# die "Cannot open \"$fileName\"!" if (! defined $fh);
# while (defined(my $line = <$fh>)) { # while:

chomp($line);
print "EARLY: $line\n"; # debug.

if ( ( $line =~ /^source / ) &&
( $line =~ /^source\s+([A-Za-z0-9._-]+)/ ) ) { # dive-in:
# CPERL-MODE won't bounce on either rparen ^ ^ -- WHY?
my $diveIntoFName = $1;

print("Now, we'll try to source-in file \"$diveIntoFName\"\n"); # debug.

processOneFile($diveIntoFName, $level + 1); # RECURSE

print("LATE: $line\n"); # debug.

} # dive-in.

} # end-while.

print("----- EXITING processOneFile(\"$fileName\", $level\n\n");
} # end-sub processOneFile.


processOneFile("test1.source", 1);





Thanks!

David
 
T

Theo van den Heuvel

David Combs said:
What I want to do traverse files that recursively
"include" or (as in csh .cshrc-files) "source"
other files -- then come back and continue
to process the "sourcing" file.

[lots of frustration snipped]

----------------- here it is (ugh!):


#!/me-FIRST-in-PATH-bin/perl-5.8.6/bin/perl5.8.6 -w
traverse-via-SOURCE-stmts.pl
use strict;
use diagnostics;
use warnings;

good start.
use Carp;
use IO::Handle; # test of 11may05: (recursive opens)
use FileHandle; # ---- from pg 895 in camel 3rd:

# ------------------------------------------------------------------------
# ------------------------------------------------ processOneFile():::
# ------------------------------------------------------------------------

# cperl-indent-region on the entire sub, complains (unfortunately
# the err-msg doesn't make it to *Messages*):
# Scan error: "Unbalanced parentheses", 3002, 5180"


sub processOneFile {
my $fileName = shift;
my $level = shift;

print("\n========================= ENTERING processOneFile(\"$fileName\",
$level\n");

# ---- from pg 895 in camel 3rd, (with "my" added):

my $fh = new FileHandle;
die "Cannot open \"$fileName\"!" if (! fh->open( $fileName ));
^

Make this into $fh. Perl complained about that when I ran it.
Now it seems to run.
BTW, I find it more clear to say
... unless $fh->open($fileName);


[rest of code snipped]
Thanks!

David

You're welcome.

Theo van den Heuvel
 
J

Joe Smith

David said:
I originally had it at least opening the files in
the right order, but not restoring the prior
filehandle on returning

That's not a problem if you use lexical file handles.
Not too many books show how to use lexical file handles.
They eliminate the need to "use FileHandle;".
my $fh = new FileHandle;
die "Cannot open \"$fileName\"!" if (! fh->open( $fileName ));

open my $fh,'<',$fileName or return warn "Cannot open $fileName: $!\n";
if ( ( $line =~ /^source / ) &&
( $line =~ /^source\s+([A-Za-z0-9._-]+)/ ) ) { # dive-in:
# CPERL-MODE won't bounce on either rparen ^ ^ -- WHY?
my $diveIntoFName = $1;

if ($line =~ /^source\s+(\S+)/) {
my $diveIntoFName = $1;

-Joe
 
J

John W. Krahn

David said:
What I want to do traverse files that recursively
"include" or (as in csh .cshrc-files) "source"
other files -- then come back and continue
to process the "sourcing" file.

Files like these:

375 ====> head -5000 test{1,2,3}.source
==> test1.source <==
foo
source test2.source
bar
bletch

==> test2.source <==
this is test2.source, line ONE
source test3.source
this is test2.source, line THREE
this is test2.source, line FOUR

==> test3.source <==
this is test3.source, line ONE.
this is test3.source, line TWO.
376 ====> !! >! foo.out
head -5000 test{1,2,3}.source > ! foo.out
377 ====>

This works for me:

#!/usr/bin/perl
use warnings;
use strict;


source( 'test1.source' );

sub source {
local *ARGV;
@ARGV = shift;
my $level = 1 + shift;
while ( <> ) {
if ( /^source\s+(.+)/ ) {
source( $1, $level )
}
else {
print "$level: $_"
}
}
}

__END__



John
 
D

David Combs

....
This works for me:

#!/usr/bin/perl
use warnings;
use strict;


source( 'test1.source' );

sub source {
local *ARGV;
@ARGV = shift;
my $level = 1 + shift;
while ( <> ) {
if ( /^source\s+(.+)/ ) {
source( $1, $level )
}
else {
print "$level: $_"
}
}
}

__END__



John

Thank you for the fresh (and clever) code:

381 ====> /me-FIRST-in-PATH-bin/perl-5.8.6/bin/perl5.8.6 -w krahn--traverse-sources.pl
Use of uninitialized value in addition (+) at krahn--traverse-sources.pl line 12.
1: foo
2: this is test2.source, line ONE
3: this is test3.source, line ONE.
3: this is test3.source, line TWO.
2: this is test2.source, line THREE
2: this is test2.source, line FOUR
1: bar
1: bletch
382 ====>

(again, here's the three "test.source" files:

==> test1.source <==
foo
source test2.source
bar
bletch

==> test2.source <==
this is test2.source, line ONE
source test3.source
this is test2.source, line THREE
this is test2.source, line FOUR

==> test3.source <==
this is test3.source, line ONE.
this is test3.source, line TWO.

)



ONE QUESTION: I see no explicit "open", so please
explain a bit of the magic involved. Like, how
*does* the opening work -- and what makes it
get recursed-back-to the prior value on return?

Yes, that's confusing to me -- how does your glob-thing
get the file opened on "its" level, but with some other
file getting opened on *that* (next-lower) level.

How does it work?

Well, I do note that you don't use "my" for *ARGV, so it isn't lexical ...

Instead, you use "local", thus it's "dynamically" bound
(like in emacs-lisp and in old long-gone maclisp?) -- but
surprisingly after an "inner" (ie during recursive call)
assignment from shift, it gets magically restored on
return back to prior frame to the value it last had
there.

(I suppose that caught-exceptions also watch for
locals that need restoring, if the catching is done
across sub-call stack-levels?

Like, in a deeper recursion level you encounter
a divide-by-zero, and what you want to do is
abort out of that level (and file being read)
and return one level up and keep reading from *that*
level's file.)


If so, how would you add it in -- is the *only* way to
catch one via an dynamically-outer eval?)


Also, suppose you wanted to use "my" instead of "local" -- could
you do that with the glob-scheme, or would you want to use
some other method for opening files? What would have to change
for it to work *that* way?



I think there's some neat stuff in what happens behind-the-scenes
in your solution -- I'd sure like to understand it better!


Thank you for the idea!

David
 
D

David Combs

....
....
^

Thanks -- added the needed "$", as suggested, and it works!


(Amazing -- a missing $, and error-msgs that I couldn't
make much sense out of -- such a simple fix!)



Further, cperl-mode gives the same old msg when I
say "M-x cperl-indent-region" (and the procedure
has been regionized):

Scan error: unbalance parentheses, 1398, 1939

where the 1398 is the char-position (goto-char)
is the left-paren just after the "if":


if ( ( $line =~ /^source / ) &&
( $line =~ /^source\s+([A-Za-z0-9._-]+)/ ) ) { # dive-in:
# CPERL-MODE won't bounce on either rparen ^ ^ -- WHY?
my $diveIntoFName = $1;

, with the 1939 char position being at EOF, ie here:


} # dive-in.

} # end-while.

print("----- EXITING processOneFile(\"$fileName\", $level\n\n");
} # end-sub processOneFile.


processOneFile("test1.source", 1);



HERE <<-------------- actual end of file (well, of emacs buffer).

Any idea why cperl seems to be confused?



Thanks for the help!

David


Thanks so much!
 
J

John W. Krahn

David said:
Thank you for the fresh (and clever) code:

381 ====> /me-FIRST-in-PATH-bin/perl-5.8.6/bin/perl5.8.6 -w krahn--traverse-sources.pl
Use of uninitialized value in addition (+) at krahn--traverse-sources.pl line 12.

You can remove that warning by changing the line:

my $level = 1 + shift;

To:

my $level = 1 + ( shift || 0 );
1: foo
2: this is test2.source, line ONE
3: this is test3.source, line ONE.
3: this is test3.source, line TWO.
2: this is test2.source, line THREE
2: this is test2.source, line FOUR
1: bar
1: bletch
382 ====>

(again, here's the three "test.source" files:
[snip]

ONE QUESTION: I see no explicit "open", so please
explain a bit of the magic involved. Like, how
*does* the opening work -- and what makes it
get recursed-back-to the prior value on return?

The magic works through the @ARGV array, the ARGV filehandle and the magical
<> readline operator. If there are file names in the @ARGV array and you use
<> to read from, then each file is opened in turn and the current line of
every file is assigned to $_ in turn. The perlop.pod document contains more
information in the "I/O Operators" section, also perlsyn.pod in the "Loop
Control" section

perldoc perlop
perldoc perlsyn

Yes, that's confusing to me -- how does your glob-thing
get the file opened on "its" level, but with some other
file getting opened on *that* (next-lower) level.

How does it work?

Well, I do note that you don't use "my" for *ARGV, so it isn't lexical ...

No, $ARGV, @ARGV and ARGV are special in perl.

perldoc perlvar



John
 
T

Theo van den Heuvel

David Combs said:
Further, cperl-mode gives the same old msg when I
say "M-x cperl-indent-region" (and the procedure
has been regionized):

Scan error: unbalance parentheses, 1398, 1939

where the 1398 is the char-position (goto-char)
is the left-paren just after the "if":


if ( ( $line =~ /^source / ) &&
( $line =~ /^source\s+([A-Za-z0-9._-]+)/ ) ) { # dive-in:
# CPERL-MODE won't bounce on either rparen ^ ^ -- WHY?
my $diveIntoFName = $1;

, with the 1939 char position being at EOF, ie here:


} # dive-in.

} # end-while.

print("----- EXITING processOneFile(\"$fileName\", $level\n\n");
} # end-sub processOneFile.


processOneFile("test1.source", 1);



HERE <<-------------- actual end of file (well, of emacs buffer).

Any idea why cperl seems to be confused?

No, 'fraid not. I'm a vim-user myself.

Theo van den Heuvel
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top