Regex substitute w/ match variables

G

Gary sCHENK

I am a self-taught at Perl. I use Perl a few times a year, mostly to
process text files. I'm trying to rename files in a directory. My
skills are quite rudimentary.

The files are currently named like this: SR-01-234-5.jpg
I want to rename them like this: SR-01-234-0005.jpg

I have a couple of thousand of these. I've already written several
several variations of the following script to get them to this stage,
but adding the extra zeros has me stumped. This is the script:
===============================================================================
#!perl -w

opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
while ( defined ( my $filename = readdir( DH ) ) ) {
my $foo = $filename;
if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {
if ( length( $2 ) == 1 ) {
$foo =~ s/$1$2$3/$1000$2$3/;
rename( $filename, $foo );
#print "$1\n";
}
}
}
closedir( DH );

===============================================================================

The print statement is an attempt at debugging. When I comment out the
substitution and the call to rename and just print $1, the output is
what I expect. When I run this script as shown above, however, files
come up missing, or the zeros are added in the wrong place.

Is it possible to use match variables in substitutions? The llama book
shows match variables being used outside of regular expression
operations, but not in this fashion.

And why are the files being deleted? I'm really stumped, and would
appreciate any and all help.

All the best,
Gary Schenk
 
A

A. Sinan Unur

I am a self-taught at Perl. I use Perl a few times a year, mostly to
process text files. I'm trying to rename files in a directory. My
skills are quite rudimentary.

The files are currently named like this: SR-01-234-5.jpg
I want to rename them like this: SR-01-234-0005.jpg

I have a couple of thousand of these. I've already written several
several variations of the following script to get them to this stage,
but adding the extra zeros has me stumped. This is the script:
#!perl -w

use warnings;

is better because it allows you to selectively turn warnings on/off. See

perldoc warnings
opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
Good.

while ( defined ( my $filename = readdir( DH ) ) ) {
my $foo = $filename;

Completely unnecessary.
if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {

I think this is better written as:

if ($foo =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
if ( length( $2 ) == 1 ) {
$foo =~ s/$1$2$3/$1000$2$3/;

sprintf will work very nicely here:

my $new = sprintf "$1%4.4d$3", $2;
And why are the files being deleted?

From perldoc -f rename:

rename OLDNAME,NEWNAME
Changes the name of a file; an existing file NEWNAME will be
clobbered.

I would suggest skipping the rename if the new name is the same as the
old name.

Also, note perldoc -f readdir:

If you're planning to filetest the return values out of a
"readdir", you'd better prepend the directory in question.
Otherwise, because we didn't "chdir" there, it would have been
testing the wrong file.

So, you should either chdir to the working directory, or prepend the
directory name to each file name.

Putting all of this together, here is a revised version of your script:

#! /usr/bin/perl

use strict;
use warnings;

use File::Spec::Functions 'catfile';

my $dir = shift || $ENV{TMP};

opendir my $dh, $dir
or die "Error opening directory $dir: $! ";

while( my $old = readdir $dh ) {
if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
my $new = sprintf "$1%4.4d$3", $2;

if($new eq $old) {
print "Skipping $old\n";
next;
}

$old = catfile $dir, $old;
$new = catfile $dir, $new;

print "$old => $new\n";

# rename $old, new
# or warn "Error renaming $old to $new: $!";
}
}

closedir $dh or die "Error closing directory $dir: $!";

Sinan
 
A

Anno Siegel

Gary sCHENK said:
I am a self-taught at Perl. I use Perl a few times a year, mostly to
process text files. I'm trying to rename files in a directory. My
skills are quite rudimentary.

The files are currently named like this: SR-01-234-5.jpg
I want to rename them like this: SR-01-234-0005.jpg

I have a couple of thousand of these. I've already written several
several variations of the following script to get them to this stage,
but adding the extra zeros has me stumped. This is the script:
===============================================================================
#!perl -w

Why not strict? Your program seems to be written for it.
opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
while ( defined ( my $filename = readdir( DH ) ) ) {
my $foo = $filename;
if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {

Your regex is fine though slightly more general than your example. However,
substitution with s/// isn't always the best way to turn a string into
another. For formatting numbers, there is sprintf.
if ( length( $2 ) == 1 ) {
$foo =~ s/$1$2$3/$1000$2$3/;
rename( $filename, $foo );
#print "$1\n";
}
}
}
closedir( DH );

===============================================================================

The print statement is an attempt at debugging. When I comment out the
substitution and the call to rename and just print $1, the output is
what I expect. When I run this script as shown above, however, files
come up missing, or the zeros are added in the wrong place.

So why didn't you print out $foo for debugging? That way you'd have known
what you are trying to rename your files to. You are probably renaming
many files all to the same name. That's the same as deleting all but one
of them.
Is it possible to use match variables in substitutions? The llama book
shows match variables being used outside of regular expression
operations, but not in this fashion.

It's using them inside *another* regex that's problematic. Every regex
evaluation resets them. You can assign the matches to named variables
that don't have that problem (see below).

Here's how I would do it (your regex is unchanged):

my $filename = 'SR-01-234-5.jpg';
my ( $pre, $num, $suf) =
$filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
my $foo = sprintf "%s%04d%s", $pre, $num, $suf;
print "$filename -> $foo\n";

Anno
 
A

Anno Siegel

A. Sinan Unur said:

[Good advice]
if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
my $new = sprintf "$1%4.4d$3", $2;

Just one note. It is generally a bad idea to put variable strings into
a sprintf format. They could decide to contain a "%" one day. I realize
the regex doesn't allow this in this case, but on principle I'd do

sprintf '%s%4.4d%s', $1, $2, $3;

Anno
 
D

Damian James

#!perl -w

opendir( DH, "d:\\temp" ) or die "couldn't open d:\\temp: $! ";
while ( defined ( my $filename = readdir( DH ) ) ) {
my $foo = $filename;
if ($foo =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/ ) {
if ( length( $2 ) == 1 ) {
$foo =~ s/$1$2$3/$1000$2$3/;
rename( $filename, $foo );
#print "$1\n";
}
}
}
closedir( DH );
...
Is it possible to use match variables in substitutions? The llama book
shows match variables being used outside of regular expression
operations, but not in this fashion.

That substitution in the inner loop is doing rather differently than
what you appear to be expecting. Looking at it...

$foo =~ s/$1$2$3/$1000$2$3/;

First, the pattern you are matching will be the contents of the
matched strings from the previous pattern, not the pattern itself,
and NOT including hte parentheses. So taking those strings, concatenated
together, as a pattern, you are not in fact assigning anything to $1, $2 and
$3 the second time. This does mean that they retain their previous values.
The string you are substituting however, starts with the variable $1000,
which is not populated. Doing "${1}000" instead should help, but I don't
understand why you are using a substitution here at all. Why not just
assign the result?

Have you tried printing $foo? Try replacing the substitution with:

$foo = "${1}000$2$3";
And why are the files being deleted? I'm really stumped, and would
appreciate any and all help.

Well, $1000 is empty, thus "5a.jpg" or something like it
has been the resulting string several times, so you're renaming
multiple files to the same name? Couldn't say for sure without
seeing your directory listing.

NB, if I were doing this I'd probably have used glob() rather
than opendir(). Also, perl even on win32 can understand normal
slashes, so there's no need to the double-backwhacks. I'd still
only put the path in once, though:

my $path = 'd:/temp';
my @files = glob( "$path/SR*.jpg" );

....or somesuch

Hope this helps
--damian
 
A

A. Sinan Unur

(e-mail address removed)-berlin.de (Anno Siegel) wrote in
A. Sinan Unur said:

[Good advice]
if ($old =~ /^(SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg)$/ ) {
my $new = sprintf "$1%4.4d$3", $2;

Just one note. It is generally a bad idea to put variable strings
into a sprintf format. They could decide to contain a "%" one day. I
realize the regex doesn't allow this in this case, but on principle
I'd do

sprintf '%s%4.4d%s', $1, $2, $3;

Definitely, that was on my list of things to add, but forgot. Thanks for
catching it.

Sinan.
 
A

A. Sinan Unur

Important correction:
while( my $old = readdir $dh ) {

I edited out the crucial test for defined when I was changing things. This
line should have been, as it was in the original post,

while( defined (my $old = readdir $dh) ) {

Sorry.

Sinan.
 
D

Damian James

...
It's using them inside *another* regex that's problematic. Every regex
evaluation resets them. You can assign the matches to named variables
that don't have that problem (see below).

Reset? My understanding was, previous matches are retained (which makes
what the OP was trying to do more confusing, beacuse sometimes it may
have succeeded). From perlre:

NOTE: failed matches in Perl do not reset the match variables, which
makes easier to write code that tests for a series of more specific
cases and remembers the best match.
Here's how I would do it (your regex is unchanged):

my $filename = 'SR-01-234-5.jpg';
my ( $pre, $num, $suf) =
$filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
my $foo = sprintf "%s%04d%s", $pre, $num, $suf;
print "$filename -> $foo\n";

Indeed.

--damian
 
A

Anno Siegel

Damian James said:
Reset? My understanding was, previous matches are retained (which makes
what the OP was trying to do more confusing, beacuse sometimes it may
have succeeded). From perlre:

NOTE: failed matches in Perl do not reset the match variables, which
makes easier to write code that tests for a series of more specific
cases and remembers the best match.

Yes, *failed* matches retain the values. A successful match resets
them (even if it doesn't capture anything itself). Since the pattern
/$1$2$3/ would match the original string ("." matching itself), at
the time of substitution $1, $2 and $3 would be undefined.

my $filename = 'SR-01-234-5.jpg';
$filename =~ /(^SR-\d{2}-\d{3}-)(\d+)([a-zA-Z]{0,1}\.jpg$)/;
{
no warnings 'uninitialized';
$filename =~ s/$1$2$3/$1$2$3/;
}
print "*$filename*\n";

Anno
 
T

Tad McClellan

A. Sinan Unur said:
Important correction:


I edited out the crucial test for defined when I was changing things.


It actually isn't crucial at all.

This
line should have been, as it was in the original post,

while( defined (my $old = readdir $dh) ) {


perl -MO=Deparse -e 'while( my $old = readdir $dh ) { }'

and

perl -MO=Deparse -e 'while( defined (my $old = readdir $dh) ) { }'

make the same output. :)


If you leave out the defined(), perl will put it in for you.
 
D

Damian James

...
Yes, *failed* matches retain the values. A successful match resets
them (even if it doesn't capture anything itself). Since the pattern
/$1$2$3/ would match the original string ("." matching itself), at
the time of substitution $1, $2 and $3 would be undefined.

Ah, nifty.

--damian
 
A

A. Sinan Unur

It actually isn't crucial at all.

Good to know.
perl -MO=Deparse -e 'while( my $old = readdir $dh ) { }'

and

perl -MO=Deparse -e 'while( defined (my $old = readdir $dh) ) { }'

make the same output. :)

Hmmm. I thought the magic only applied to readline. I stand corrected.

Thank you.

Sinan
 
T

Tad McClellan

A. Sinan Unur said:
Hmmm. I thought the magic only applied to readline. I stand corrected.


I used the perldiag description of the warning to figure
out where it applies:

=item Value of %s can be "0"; test with defined()

(W misc) In a conditional expression, you used <HANDLE>, <*> (glob),
C<each()>, or C<readdir()> as a boolean value.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top