Help with RegEx

V

vector

I'm designing an application that watches directories, waiting for
files to appear. When the app sees some files, it checks the
filenames against a regular expression to determine whether or not the
files should be processed.

Not all directories are subject to processing by the same validation
rules, so I cannot solve the problem of filtering in code. The rules
for processing must be configurable per directory; hence my choice of
regular expressions contained in a config file as a filter method.

Given a list of files in a directory, I want to return sets of file
pairs. First, the target directory should be scanned for files with
20-character names and extensions of either .TIF or .EA. The
20-character name begins with 16 numbers, and the last four characters
can be any hexadecimal character, 0-9 or A-F. I created this (ugly)
regex to satisfy that requirement:

^(([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9])+[A-F0-9][A-F0-9][A-F0-9][A-F0-9])\.(EA|TIF)$

Beyond that, I'm uncertain of how to proceed. The next step should be
to take the resulting set of filenames, and extract file pairs of TIFs
and EAs that have the same name. For example, given this list of
files:

1234567890123456FFFF.EA
1234567890123456FFFF.TIF
99999999999999999999.EA
99999999999999999999.JPG
ZZZZZZZZZZZZZZZZZZZZ.TIF

my search will return, properly, the files "…FFFF.EA", "….FFFF.TIF",
and "…9999.EA". What I need is to extend the regex to refine those
results, so that the file named "…9999.EA" is excluded from the
results set, as it has no matching .TIF file.

I'd appreciate any help on this.
 
T

Tad McClellan

vector said:
Given a list of files in a directory, I want to return sets of file
pairs.


Using the word "pair" in the problem description often implies
"use a hash" somewhere.

First, the target directory should be scanned for files with
20-character names and extensions of either .TIF or .EA. The
20-character name begins with 16 numbers, and the last four characters
can be any hexadecimal character, 0-9 or A-F.
The next step should be
to take the resulting set of filenames, and extract file pairs of TIFs
and EAs that have the same name.

so that the file named "…9999.EA" is excluded from the
results set, as it has no matching .TIF file.


--------------------------------------
#!/usr/bin/perl
use strict;
use warnings;

my $dir = '.';
opendir DIR, $dir or die "could not open '$dir' directory $!";

my %pairs;
foreach my $ea ( grep /^\d{16}[\dA-F]{4}\.EA$/, readdir DIR ) {
(my $tif = $ea) =~ s/EA$/TIF/;
$pairs{$ea} = $tif if -e "$dir/$tif";
}
closedir DIR;

print "$_ ==> $pairs{$_}\n" for sort keys %pairs;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,225
Members
46,815
Latest member
treekmostly22

Latest Threads

Top