Program optimisation advice for newbie

C

ChocMonk

Hello,

I've got a program snippet that takes a multiline string for exapmple:
"asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9"

and forms a hash from it, where each key is the part of each line upto
the first "@" and the value is everything following the second "@". So
the hash key/value pairs for the above string look like:
asdasdasd => \main\3
boohoo => \main\6
toodle => \main\9

Here is the snippet I wrote:
----BEGIN CODE--
$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

#grab the filename list
(my $crap, my $cset) = split /names/,$descr,2;
#split each line into an array
my @temp1 = split /\n/,$cset;

#chuck the list into a hash
my %names;
foreach (@temp) {
(my $name, my $ver) = split /\@\@/,$_;
$names{$name} = $ver;
}
----END CODE--

Is this a good way to do it? Any comments on making it more
efficient/elegant would be much appreciated.

Cheers,
Choco
 
A

Alan Mead

Star date: Mon, 03 Jan 2005 19:55:33 -0800, ChocMonk's log:
Here is the snippet I wrote:
----BEGIN CODE--
$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

#grab the filename list
(my $crap, my $cset) = split /names/,$descr,2;
#split each line into an array
my @temp1 = split /\n/,$cset;

#chuck the list into a hash
my %names;
foreach (@temp) {
(my $name, my $ver) = split /\@\@/,$_;
$names{$name} = $ver;
}

This is your actual code? I don't follow what you're doing in the second
chunk (where you "grab the filename list") and it doesn't look like it
makes sense... (where does the 'names' string appear in your input?)
Maybe this is immaterial but I'd encourage you to cut-and-paste rather
than re-type (if you did retype).

I'm also not seeing why you create a variable @temp1 and then loop over
@temp... If this is a typo, then a STRONGLY encourage you to 'use strict;'
at the top of your script. Also 'use warnings;'.

In fact, even if it isn't a typo, I'd have the same advice.

Ok, now on to answering your question slightly... I guess it's a good
enough way. If the list is long, then you might want to use a regular
expression with the ... IIRC 'g' and 'm' options to successively capture
the lines (from memory):

my %names=();
do {
$cset =~ /$(.+?)\@\@(.+?)$/gm;
$names{$1} = $2 if (defined($1));
} until (!defined($1));

This uses a regex rather than the split.. don't know which is faster but
as written, you use twice the memory needed (because you have the string
of lines and then store the lines in a list). Even better would be to
read the lines one at a time from a file but I assume you cannot do this...

Also, you shouldn't explicitly use variables like $_ (IMHO) like you do in
the last split, I would write

my($name,$ver) = split /\@\@/;

or

foreach my $t (@temp) {
my($name,$ver) = split(/\@\@/,$t);
....

-Alan
 
C

ChocMonk

Alan said:
Star date: Mon, 03 Jan 2005 19:55:33 -0800, ChocMonk's log:


This is your actual code? I don't follow what you're doing in the second
chunk (where you "grab the filename list") and it doesn't look like it
makes sense... (where does the 'names' string appear in your input?)
Maybe this is immaterial but I'd encourage you to cut-and-paste rather
than re-type (if you did retype).

Yup sorry. I did edit some stuff I figured would be irrelevant.
Basically I'm taking a multiline string $descr and grabbing anything
after "names" into $cset, dicarding anything before in $crap.
I'm also not seeing why you create a variable @temp1 and then loop over
@temp... If this is a typo, then a STRONGLY encourage you to 'use strict;'
at the top of your script. Also 'use warnings;'.

In fact, even if it isn't a typo, I'd have the same advice.
that is a typo and I've taken your strict/warnings advice.
Ok, now on to answering your question slightly... I guess it's a good
enough way. If the list is long, then you might want to use a regular
expression with the ... IIRC 'g' and 'm' options to successively capture
the lines (from memory):

my %names=();
do {
$cset =~ /$(.+?)\@\@(.+?)$/gm;
$names{$1} = $2 if (defined($1));
} until (!defined($1));

This uses a regex rather than the split.. don't know which is faster but
as written, you use twice the memory needed (because you have the string
of lines and then store the lines in a list). Even better would be to
read the lines one at a time from a file but I assume you cannot do
this...
$descr is the output of a command so I'm not using a file.
I tried your regexp method and the match works great except that the
loop ends after the first match and $cset remains unmodified. I changed
the regexp slightly, as perl complained about the initial $ in your
regexp:

my %names=();
do {
$cset =~ /(.+?)\@\@(.+?)$/gm;
$names{$1} = $2 if (defined($1));
print STDERR "\ncset is $cset\n";
} until (!defined($1));

Also, you shouldn't explicitly use variables like $_ (IMHO) like you do in
the last split, I would write

my($name,$ver) = split /\@\@/;

or

foreach my $t (@temp) {
my($name,$ver) = split(/\@\@/,$t);
...

Thanks for the great advice.
Choc
 
J

John W. Krahn

ChocMonk said:
I've got a program snippet that takes a multiline string for exapmple:
"asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9"

and forms a hash from it, where each key is the part of each line upto
the first "@" and the value is everything following the second "@". So
the hash key/value pairs for the above string look like:
asdasdasd => \main\3
boohoo => \main\6
toodle => \main\9

Here is the snippet I wrote:
----BEGIN CODE--
$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

#grab the filename list
(my $crap, my $cset) = split /names/,$descr,2;
#split each line into an array
my @temp1 = split /\n/,$cset;

#chuck the list into a hash
my %names;
foreach (@temp) {
(my $name, my $ver) = split /\@\@/,$_;
$names{$name} = $ver;
}
----END CODE--

Is this a good way to do it? Any comments on making it more
efficient/elegant would be much appreciated.

This should do what you want:

$ perl -MData::Dumper -le'
my $descr = q(
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9);

my %names = $descr =~ / ( .+? ) \@\@ ( .+ ) /xg;

print Dumper \%names;
'
$VAR1 = {
'asdasdasd' => '\\main\\3',
'toodle' => '\\main\\9',
'boohoo' => '\\main\\6'
};



John
 
T

Tad McClellan

ChocMonk said:
I tried your regexp method and the match works great except that the
loop ends after the first match and $cset remains unmodified.


What part of the code were you expecting _would_ change $cset?

It is *supposed to be* unchanged, given that code.

I changed
the regexp slightly, as perl complained about the initial $ in your
regexp:


I expect it was supposed to be the beginning-anchor (^) rather
than the ending-anchor ($).

I'm afraid I'd have to characterize Alan's code above as "hokey".

You should *never* use the dollar-digit variables unless you
have first ensured that the match has *succeeded*.

while ( $cset =~ /(.+)\@\@(.+)/g ) { # untested also
$names{$1} = $2;
}
 
M

Michele Dondi

$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

Incidentally rather than adopting a qq// multiline string I'd use an
HERE doc.
Is this a good way to do it? Any comments on making it more
efficient/elegant would be much appreciated.

Due to being in a great hurry, I didn't actually read the rest of your
code. For simplicity I'll assume that your input string is actually
coming in from a filhandle (which seems reasonable - if it isn't there
are other WTDI, and you can even "fake" a filehandle):

my %names = map split /\@\@/, <$fh>;

Of course this is inherently insecure unless you can absolutely rely
on the input's format. If you can't additional checks will be
required. This part is left as an exercise. (If you still want to do
it fundamentally "like the above" you probably want to use the "BLOCK
form" of map().)


Michele
 
M

Michele Dondi

my %names = map split /\@\@/, <$fh>;

While what I wrote is still fundamentally correct IMHO, the kind of
code others suggested is definitely better, especially if you *really*
have to work with a multi-line string.


Michele
 
C

ChocMonk

John said:
This should do what you want:

$ perl -MData::Dumper -le'
my $descr = q(
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9);

my %names = $descr =~ / ( .+? ) \@\@ ( .+ ) /xg;

print Dumper \%names;
'
$VAR1 = {
'asdasdasd' => '\\main\\3',
'toodle' => '\\main\\9',
'boohoo' => '\\main\\6'
};



John

And it does! John's solution floats my boat. Thanks Alan, Tad, Michele.
This was my very first post here and I'm amazed by the response. Much
appreciated, a better way to learn and improve than just using a book.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,164
Messages
2,570,898
Members
47,440
Latest member
YoungBorel

Latest Threads

Top