code not working..ideas please?

G

Geoff Cox

Hello,

In the following code minorlist contains a list of names of schools.
The major list conatins these names as part of school email addresses
with other school email addresses.

I am trying to take the first word on each line of the minorlist, find
the associated email address in the majorlist and print the email
address out to the emails file.

For some reason the emails list contains all the emails in the
majorlist and I cannot see why?!

Cheers

Geoff

open (IN, "d:/minorlist");
open (INN, "d:/majorlist");
open (OUT, ">>d:/emails");

while (defined (my $line =<IN>)) {
$line =~ /^(.*?)\s/i;
&email($1);
}

sub email {

while (defined (my $line2 = <INN>)) {
if ($line2 =~ /$1/i) {
print OUT ($line2);
}
}
}
 
R

Ryan Shondell

Geoff Cox said:
Hello,

In the following code minorlist contains a list of names of schools.
The major list conatins these names as part of school email addresses
with other school email addresses.

I am trying to take the first word on each line of the minorlist, find
the associated email address in the majorlist and print the email
address out to the emails file.

For some reason the emails list contains all the emails in the
majorlist and I cannot see why?!

Cheers

Geoff

You forgot...

use warnings;
use strict;

Never leave home without them. :)
open (IN, "d:/minorlist");
open (INN, "d:/majorlist");
open (OUT, ">>d:/emails");

You should _always_ check the return values of your open statements.

open (IN, 'foo') or die "Can't open foo: $!";
while (defined (my $line =<IN>)) {
$line =~ /^(.*?)\s/i;

So you want to match 0 or more of any character, non-greedily,
followed by whitespace? You realize that even just a newline, or a
space will match that test? And there doesn't appear to be a benefit
to using /i for case-insensitive matching, since you're not using any
letters in your regex. If you want the first word, do

$line =~ /(\w+)/;
&email($1);

Uh oh. What happens if your regex doesn't match, and put a new value
into $1? Answer...you have a stale value in $1 (whatever you matched
previously). Probably not what you wanted. You could throw an 'if' in
there...

if ($line =~ /(\w+)/) {
email($1);
}

Also, don't use the & version of calling subs, unless you specifically
need the properties you get by doing so. Do you? Do you need to
circumvent prototypes?

Use the syntax, email($1);
sub email {

while (defined (my $line2 = <INN>)) {
if ($line2 =~ /$1/i) {

Eww, I can see all sorts of problems happening here. You're passing a
value to this sub, but not using it (well, kinda sorta you are
sometimes...in a round-about-ish way). Use the value you just passed
in to the sub.

my $word = shift;
if ($line2 =~ /$word/i)
print OUT ($line2);
}
}
}

Hopefully this gets you started in the right direction. I suggest
perusing some of the perldocs, specifically perlsub, perlop, and
perlre for information on subs, the m// operator and its options, and
regular expressions, respectively.

Ryan
 
G

Glenn Jackman

Geoff Cox said:
open (IN, "d:/minorlist");
open (INN, "d:/majorlist");
open (OUT, ">>d:/emails");

while (defined (my $line =<IN>)) {
$line =~ /^(.*?)\s/i;
&email($1);
}

sub email {
while (defined (my $line2 = <INN>)) {
if ($line2 =~ /$1/i) {
print OUT ($line2);
}
}
}


Don't read the majorlist file for each line of minorlist.

open MINOR, 'd:/minorlist' or die "can't open minorlist: $!\n";
while (my $line = <MINOR>) {
push @schools, $1 if $line =~ /^(\S+)/;
# note the regex must match at least one character.
# you were probably passing an empty string to your subroutine,
# which of course matches any string.
}
close MINOR;
# depending on the format of your files, this may suffice in place
# of the while loop:
# chomp( my @schools = <MINOR> );

open MAJOR, 'd:/majorlist' or die "can't open majorlist: $!\n";
open OUT, '>>d:/emails' or die "can't open emails for writing: $!\n";
while (my $line = <MAJOR>) {
print OUT $line if grep { $line =~ /\Q$_/i } @schools;
}
close MAJOR;
close OUT;
 
G

Geoff Cox

Ryan

Thanks for your comments...I have changed to code below but it only
gives me one email address in the emails file ! I must be missing
something obvious?

Could you explain the

$word = shift line of yours?

Cheers

Geoff

use warnings;
use strict;

open (IN, "secondary") or die "Can't open secondary: $!";
open (INN, "cornwall") or die "Can't open cornwall: $!";
open (OUT, ">>emails") or die "Can't open emails: $!";

while (defined (my $line =<IN>)) {
if ($line =~ /(\w+)/) {
email($1);
}
}

sub email {

my $word = $1;
$word =~ tr/A-Z/a-z/;

while (defined (my $line2 = <INN>)) {
if ($line2 =~ /$word/) {
print OUT ($line2);
}
}
}
:
 
G

Geoff Cox

On 30 Sep 2003 19:55:07 GMT, Glenn Jackman <[email protected]>
wrote:

Glen,

Thanks for the code. It certainly works - only problem now is
understanding how?!
Don't read the majorlist file for each line of minorlist.

open MINOR, 'd:/minorlist' or die "can't open minorlist: $!\n";
while (my $line = <MINOR>) {
push @schools, $1 if $line =~ /^(\S+)/;

how does line above work?
# note the regex must match at least one character.
# you were probably passing an empty string to your subroutine,
# which of course matches any string.
}
close MINOR;
# depending on the format of your files, this may suffice in place
# of the while loop:
# chomp( my @schools = <MINOR> );

open MAJOR, 'd:/majorlist' or die "can't open majorlist: $!\n";
open OUT, '>>d:/emails' or die "can't open emails for writing: $!\n";
while (my $line = <MAJOR>) {
print OUT $line if grep { $line =~ /\Q$_/i } @schools;

could you say what the line above is doing?

Geoff
 
G

Geoff Cox

On 30 Sep 2003 19:55:07 GMT, Glenn Jackman <[email protected]>
wrote:

Glenn,
open MINOR, 'd:/minorlist' or die "can't open minorlist: $!\n";
while (my $line = <MINOR>) {
push @schools, $1 if $line =~ /^(\S+)/;

I assume this matches the first group of non white space characters?
# note the regex must match at least one character.
# you were probably passing an empty string to your subroutine,
# which of course matches any string.
}
close MINOR;
# depending on the format of your files, this may suffice in place
# of the while loop:
# chomp( my @schools = <MINOR> );

open MAJOR, 'd:/majorlist' or die "can't open majorlist: $!\n";
open OUT, '>>d:/emails' or die "can't open emails for writing: $!\n";
while (my $line = <MAJOR>) {
print OUT $line if grep { $line =~ /\Q$_/i } @schools;

not clear on above line - how does grep work?

Geoff
 
R

Ryan Shondell

Geoff Cox said:
Ryan

Thanks for your comments...I have changed to code below but it only
gives me one email address in the emails file ! I must be missing
something obvious?

Could you explain the

$word = shift line of yours?

When you pass arguments to a sub, they are passed in the @_ array. The
shift function removes the first value from an array and returns
it. If you don't specify the name of the array, it uses @_.

So by doing

my $word = shift;

I'm setting $word to the first value in @_. In other words, the first
arg passed into the sub.

Cheers

Geoff

use warnings;
use strict;

open (IN, "secondary") or die "Can't open secondary: $!";
open (INN, "cornwall") or die "Can't open cornwall: $!";
open (OUT, ">>emails") or die "Can't open emails: $!";

while (defined (my $line =<IN>)) {
if ($line =~ /(\w+)/) {
email($1);
}
}

sub email {

my $word = $1;
$word =~ tr/A-Z/a-z/;

Instead of this, you should probably use the /i option to m// below.
while (defined (my $line2 = <INN>)) {
if ($line2 =~ /$word/) {

if ($line =~ /$word/i) {
print OUT ($line2);
}
}
}

You should definitely follow the example posted elsewhere in this
thread on how to solve your problem. One of the problems you have of
doing it this way is that if you have already pulled the lines from
INN that contain what you want to match, you won't match.

Okay, that came out a bit confusing... :)

For example, say that your files look like this...

__IN__
some
things
here
__IN__

__INN__
things
some
foo
__INN__

Your program will read in the word "some", and then start searching
through INN until it finds a match. So it will grab and discard the
word "things" from INN.

When your program processes the word "things" from your first file, it
will then start looking through the second file _at the place where
you stopped_ the first time. So it will have missed a word it should
have gotten.

In trying to solve some of your smaller problems, I completely missed
the bigger one; the implementation. Sorry bout that... :-(

Ryan
 
G

Glenn Jackman

Geoff Cox said:
I assume this matches the first group of non white space characters?

It matches the first group on non white space characters, only if they
are at the beginning of the line (note the ^ anchor)
not clear on above line - how does grep work?

http://www.perldoc.com/perl5.6/pod/func/grep.html

@schools is an array of patterns. I want to check each of these
patterns against the current line. grep is a concise function to
iterate over an array and perform some action. In this case, the return
value of grep is the number of elements in @schools that "match" $line.

The regex /\Q$_/i contains this magic:
$_ (the current placeholder to an element in the @schools array),
\Q which means any regex metacharacters in $_ (such as * or +)
should be escaped of their special meaning, and
/i to ignore case.

Another way to code that line could be:
foreach my $element (@schools) {
if ($line =~ /\Q$element/i) {
print OUT $line;
}
}

I highly recommend these O'Reilly books: "Learning Perl", "Programming
Perl" and "Mastering Regular Expressions".
 
C

Carlton Brown

<snip>

Looks like you're assuming that perl args to a subroutine show up as
numbered positionals, like in a shell script. Bzzt. They show up in
an array called @_ which can be shifted or directly referenced like
any other array.
sub email {
my $school = shift @_; # (this would work)
my $school = $_[0]; # (or this, if you prefer)
while (defined (my $line2 = <INN>)) {
# Bleh. if ($line2 =~ /$1/i) {
Better. if ($line2 =~ /$school/i) {
print OUT ($line2);
}
}
}

PS - if you use hashes, you will experience joy should you ever need
to refactor this code in the future.
PPS - Your symbol names are kinda bad. "INN?" Bleh.
 
G

Geoff Cox

It matches the first group on non white space characters, only if they
are at the beginning of the line (note the ^ anchor)


http://www.perldoc.com/perl5.6/pod/func/grep.html

@schools is an array of patterns. I want to check each of these
patterns against the current line. grep is a concise function to
iterate over an array and perform some action. In this case, the return
value of grep is the number of elements in @schools that "match" $line.

The regex /\Q$_/i contains this magic:
$_ (the current placeholder to an element in the @schools array),
\Q which means any regex metacharacters in $_ (such as * or +)
should be escaped of their special meaning, and
/i to ignore case.

Another way to code that line could be:
foreach my $element (@schools) {
if ($line =~ /\Q$element/i) {
print OUT $line;
}
}

I highly recommend these O'Reilly books: "Learning Perl", "Programming
Perl" and "Mastering Regular Expressions".

Glenn,

Thanks - have now much better idea of what is happening...thanks for
the book suggestions too.

Cheers

Geoff
 
G

Geoff Cox

When you pass arguments to a sub, they are passed in the @_ array. The
shift function removes the first value from an array and returns
it. If you don't specify the name of the array, it uses @_.

So by doing

my $word = shift;

I'm setting $word to the first value in @_. In other words, the first
arg passed into the sub.

Got it ! Thanks

Geoff
 
G

Geoff Cox

<snip>

Looks like you're assuming that perl args to a subroutine show up as
numbered positionals, like in a shell script. Bzzt. They show up in
an array called @_ which can be shifted or directly referenced like
any other array.
sub email {
my $school = shift @_; # (this would work)
my $school = $_[0]; # (or this, if you prefer)

Carlton,

Thanks - now appreciate what the shift does..

Cheers

Geoff
 
H

Helgi Briem

Thanks for your reply but "respect locales" ?

Would your way recognise that 'Þ' (ord=222) is a valid letter in
Icelandic and has the lowercase version 'þ' (ord=254)?
That's locale. Here's an example:

use locale qw(Icelandic);
print lc('ÞÓR');

Output:
þór
 
T

Tad McClellan

Geoff Cox said:
Thanks for your reply but "respect locales" ?


"locale"s are for dealing with languages other than English (sorta).


perldoc perllocale
 
G

Geoff Cox

Would your way recognise that 'Þ' (ord=222) is a valid letter in
Icelandic and has the lowercase version 'þ' (ord=254)?
That's locale. Here's an example:

use locale qw(Icelandic);
print lc('ÞÓR');

Output:

Thanks for the explanation.

Cheers

Geoff
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top