Regex assistance

D

disco

I am using the readdir comand to read from a directory, but I have
presented the
contents of my directory below as <DATA> for this presentation so you
can run
my example. Surely I missing something simple. Can someone help?

I want to create a listing meeting the following criteria:
1) The line must start with a letter or number (e.g. no ".", etc.)
2) Only include letters and numbers up until a "." or "_" and disgard
the rest
3) My list can contain duplicates

expected result
===================
goofup
listing
bobby
junk
junk
junk34
bdfds
juice
master1
master2
memory
login
aray
test

----------------- script --------------------

while ( defined ($filename = <DATA>) )
{
if ($filename !~ /^\.|[#@]/) #Add to array if line does not begin
with "." or contain "#" or "@"
{
$filename =~ /(.*?)(\.|\_)(.*?)$/;
push @dirlist, $1;
}

}
foreach (@dirlist)
{
print $_ . "\n";
}


__DATA__
..
...
goofup.exe
listing.txt
bobby_switch_db60d
junk_none.log
junk_db4c5A
junk34_stwx_bine33A
bdfds_db4a4A
juice
master1
master2
memory_profile_db
login_profiles
aray_db76eA
test_master1
..profile
..cshrc
 
M

mgarrish

disco said:
I want to create a listing meeting the following criteria:
1) The line must start with a letter or number (e.g. no ".", etc.)
2) Only include letters and numbers up until a "." or "_" and disgard
the rest
if ($filename !~ /^\.|[#@]/) #Add to array if line does not begin
with "." or contain "#" or "@"
{
$filename =~ /(.*?)(\.|\_)(.*?)$/;
push @dirlist, $1;
}

You shouldn't be trying to match in two steps when one will suffice.
Essentially, all you're trying to do is match numbers and letters at the
beginning of a file name, so there's no need to check first if the filename
starts with something other than a letter or number. You can also use a
negative look-ahead assertion to make sure there are no @s or #s in the
string:

if ($filename =~ /^([A-Za-z0-9]+)(?!.*[@#])/ {
push @dirlist, $1;
}

Matt
 
J

John W. Krahn

disco said:
I am using the readdir comand to read from a directory, but I have
presented the
contents of my directory below as <DATA> for this presentation so you
can run
my example. Surely I missing something simple. Can someone help?

I want to create a listing meeting the following criteria:
1) The line must start with a letter or number (e.g. no ".", etc.)
2) Only include letters and numbers up until a "." or "_" and disgard
the rest
3) My list can contain duplicates

expected result
===================
goofup
listing
bobby
junk
junk
junk34
bdfds
juice
master1
master2
memory
login
aray
test

----------------- script --------------------

while ( defined ($filename = <DATA>) )
{
if ($filename !~ /^\.|[#@]/) #Add to array if line does not begin
with "." or contain "#" or "@"
{
$filename =~ /(.*?)(\.|\_)(.*?)$/;
push @dirlist, $1;
}

}
foreach (@dirlist)
{
print $_ . "\n";
}

__DATA__
.
..
goofup.exe
listing.txt
bobby_switch_db60d
junk_none.log
junk_db4c5A
junk34_stwx_bine33A
bdfds_db4a4A
juice
master1
master2
memory_profile_db
login_profiles
aray_db76eA
test_master1
.profile
.cshrc


while ( <DATA> ) {
push @dirlist, /^([a-z0-9]+)/i;
}

print "$_\n" for @dirlist;



John
 
J

James E Keenan

disco said:
I am using the readdir comand to read from a directory, but I have
[snip]
I want to create a listing meeting the following criteria:
1) The line must start with a letter or number (e.g. no ".", etc.)
What about files beginning with the underscore?
2) Only include letters and numbers up until a "." or "_" and disgard
the rest
3) My list can contain duplicates

expected result
===================
goofup
listing
bobby
junk
junk
junk34
bdfds
juice
master1
master2
memory
login
aray
test

----------------- script --------------------

while ( defined ($filename = <DATA>) )
{
if ($filename !~ /^\.|[#@]/) #Add to array if line does not begin
with "." or contain "#" or "@"
{
$filename =~ /(.*?)(\.|\_)(.*?)$/;
push @dirlist, $1;
}

}
foreach (@dirlist)
{
print $_ . "\n";
}


__DATA__
.
..
goofup.exe
listing.txt
bobby_switch_db60d
junk_none.log
junk_db4c5A
junk34_stwx_bine33A
bdfds_db4a4A
juice
master1
master2
memory_profile_db
login_profiles
aray_db76eA
test_master1
.profile
.cshrc

I found it useful to tackle each step of your logic (including the comment
about @ and #) one step at a time.

my (@dirlist);
while ( <DATA> ) {
next if (/^[._]/ or /[#@]/);
# skip right over lines that are obvious non-matches;
# it will keep the next regex simpler
chomp;
if (/^(.+?)[._].*/) {
push @dirlist, $1;
} else {
push @dirlist, $_;
}
}
print "$_\n" foreach (@dirlist);

I also added several entries to your __DATA__ to reflect all your concerns:

testforlinecontaininghash#sign
testforlinecontainingat@sign
_testforfilebeginningwithunderscore

HTH!

Jim Keenan
 
T

Tad McClellan

disco said:
I want to create a listing meeting the following criteria:
1) The line must start with a letter or number (e.g. no ".", etc.)
2) Only include letters and numbers up until a "." or "_" and disgard
the rest
3) My list can contain duplicates

while ( defined ($filename = <DATA>) )
{
if ($filename !~ /^\.|[#@]/) #Add to array if line does not begin
with "." or contain "#" or "@"
{
$filename =~ /(.*?)(\.|\_)(.*?)$/;
push @dirlist, $1;
}


You can replace all of that code with this line:

my @dirlist = map { /^([a-z0-9]+)/i } <DATA>;

or, in your actual application:

my @dirlist = map { /^([a-z0-9]+)/i } readdir SOMEDIR;
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,122
Messages
2,570,716
Members
47,282
Latest member
hopkins1988

Latest Threads

Top