Program optimisation advice for newbie

ChocMonk · Jan 4, 2005

Hello,

I've got a program snippet that takes a multiline string for exapmple:
"asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9"

and forms a hash from it, where each key is the part of each line upto
the first "@" and the value is everything following the second "@". So
the hash key/value pairs for the above string look like:
asdasdasd => \main\3
boohoo => \main\6
toodle => \main\9

Here is the snippet I wrote:
----BEGIN CODE--
$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

#grab the filename list
(my $crap, my $cset) = split /names/,$descr,2;
#split each line into an array
my @temp1 = split /\n/,$cset;

#chuck the list into a hash
my %names;
foreach (@temp) {
(my $name, my $ver) = split /\@\@/,$_;
$names{$name} = $ver;
}
----END CODE--

Is this a good way to do it? Any comments on making it more
efficient/elegant would be much appreciated.

Cheers,
Choco

Alan Mead · Jan 4, 2005

Star date: Mon, 03 Jan 2005 19:55:33 -0800, ChocMonk's log:

Here is the snippet I wrote:
----BEGIN CODE--
$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

#grab the filename list
(my $crap, my $cset) = split /names/,$descr,2;
#split each line into an array
my @temp1 = split /\n/,$cset;

#chuck the list into a hash
my %names;
foreach (@temp) {
(my $name, my $ver) = split /\@\@/,$_;
$names{$name} = $ver;
}

This is your actual code? I don't follow what you're doing in the second
chunk (where you "grab the filename list") and it doesn't look like it
makes sense... (where does the 'names' string appear in your input?)
Maybe this is immaterial but I'd encourage you to cut-and-paste rather
than re-type (if you did retype).

I'm also not seeing why you create a variable @temp1 and then loop over
@temp... If this is a typo, then a STRONGLY encourage you to 'use strict;'
at the top of your script. Also 'use warnings;'.

In fact, even if it isn't a typo, I'd have the same advice.

Ok, now on to answering your question slightly... I guess it's a good
enough way. If the list is long, then you might want to use a regular
expression with the ... IIRC 'g' and 'm' options to successively capture
the lines (from memory):

my %names=();
do {
$cset =~ /$(.+?)\@\@(.+?)$/gm;
$names{$1} = $2 if (defined($1));
} until (!defined($1));

This uses a regex rather than the split.. don't know which is faster but
as written, you use twice the memory needed (because you have the string
of lines and then store the lines in a list). Even better would be to
read the lines one at a time from a file but I assume you cannot do this...

Also, you shouldn't explicitly use variables like $_ (IMHO) like you do in
the last split, I would write

my($name,$ver) = split /\@\@/;

or

foreach my $t (@temp) {
my($name,$ver) = split(/\@\@/,$t);
....

-Alan

ChocMonk · Jan 4, 2005

Alan said:
Star date: Mon, 03 Jan 2005 19:55:33 -0800, ChocMonk's log:

This is your actual code? I don't follow what you're doing in the second
chunk (where you "grab the filename list") and it doesn't look like it
makes sense... (where does the 'names' string appear in your input?)
Maybe this is immaterial but I'd encourage you to cut-and-paste rather
than re-type (if you did retype).

Yup sorry. I did edit some stuff I figured would be irrelevant.
Basically I'm taking a multiline string $descr and grabbing anything
after "names" into $cset, dicarding anything before in $crap.

I'm also not seeing why you create a variable @temp1 and then loop over
@temp... If this is a typo, then a STRONGLY encourage you to 'use strict;'
at the top of your script. Also 'use warnings;'.

In fact, even if it isn't a typo, I'd have the same advice.

that is a typo and I've taken your strict/warnings advice.

Ok, now on to answering your question slightly... I guess it's a good
enough way. If the list is long, then you might want to use a regular
expression with the ... IIRC 'g' and 'm' options to successively capture
the lines (from memory):

my %names=();
do {
$cset =~ /$(.+?)\@\@(.+?)$/gm;
$names{$1} = $2 if (defined($1));
} until (!defined($1));

This uses a regex rather than the split.. don't know which is faster but
as written, you use twice the memory needed (because you have the string
of lines and then store the lines in a list). Even better would be to
read the lines one at a time from a file but I assume you cannot do

this...
$descr is the output of a command so I'm not using a file.
I tried your regexp method and the match works great except that the
loop ends after the first match and $cset remains unmodified. I changed
the regexp slightly, as perl complained about the initial $ in your
regexp:

my %names=();
do {
$cset =~ /(.+?)\@\@(.+?)$/gm;
$names{$1} = $2 if (defined($1));
print STDERR "\ncset is $cset\n";
} until (!defined($1));

Also, you shouldn't explicitly use variables like $_ (IMHO) like you do in
the last split, I would write

my($name,$ver) = split /\@\@/;

or

foreach my $t (@temp) {
my($name,$ver) = split(/\@\@/,$t);
...

Thanks for the great advice.
Choc

John W. Krahn · Jan 4, 2005

ChocMonk said:
I've got a program snippet that takes a multiline string for exapmple:
"asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9"

and forms a hash from it, where each key is the part of each line upto
the first "@" and the value is everything following the second "@". So
the hash key/value pairs for the above string look like:
asdasdasd => \main\3
boohoo => \main\6
toodle => \main\9

Here is the snippet I wrote:
----BEGIN CODE--
$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

#grab the filename list
(my $crap, my $cset) = split /names/,$descr,2;
#split each line into an array
my @temp1 = split /\n/,$cset;

#chuck the list into a hash
my %names;
foreach (@temp) {
(my $name, my $ver) = split /\@\@/,$_;
$names{$name} = $ver;
}
----END CODE--

Is this a good way to do it? Any comments on making it more
efficient/elegant would be much appreciated.

This should do what you want:

$ perl -MData:

umper -le'
my $descr = q(
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9);

my %names = $descr =~ / ( .+? ) \@\@ ( .+ ) /xg;

print Dumper \%names;
'
$VAR1 = {
'asdasdasd' => '\\main\\3',
'toodle' => '\\main\\9',
'boohoo' => '\\main\\6'
};

John

Tad McClellan · Jan 4, 2005

ChocMonk said:
I tried your regexp method and the match works great except that the
loop ends after the first match and $cset remains unmodified.

What part of the code were you expecting _would_ change $cset?

It is *supposed to be* unchanged, given that code.

I changed
the regexp slightly, as perl complained about the initial $ in your
regexp:

I expect it was supposed to be the beginning-anchor (^) rather
than the ending-anchor ($).

I'm afraid I'd have to characterize Alan's code above as "hokey".

You should *never* use the dollar-digit variables unless you
have first ensured that the match has *succeeded*.

while ( $cset =~ /(.+)\@\@(.+)/g ) { # untested also
$names{$1} = $2;
}

Michele Dondi · Jan 4, 2005

$descr = "
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9";

Incidentally rather than adopting a qq// multiline string I'd use an
HERE doc.

Is this a good way to do it? Any comments on making it more
efficient/elegant would be much appreciated.

Due to being in a great hurry, I didn't actually read the rest of your
code. For simplicity I'll assume that your input string is actually
coming in from a filhandle (which seems reasonable - if it isn't there
are other WTDI, and you can even "fake" a filehandle):

my %names = map split /\@\@/, <$fh>;

Of course this is inherently insecure unless you can absolutely rely
on the input's format. If you can't additional checks will be
required. This part is left as an exercise. (If you still want to do
it fundamentally "like the above" you probably want to use the "BLOCK
form" of map().)

Michele

Michele Dondi · Jan 4, 2005

my %names = map split /\@\@/, <$fh>;

While what I wrote is still fundamentally correct IMHO, the kind of
code others suggested is definitely better, especially if you *really*
have to work with a multi-line string.

Michele

ChocMonk · Jan 4, 2005

John said:
This should do what you want:

$ perl -MData:umper -le'
my $descr = q(
asdasdasd@@\main\3
boohoo@@\main\6
toodle@@\main\9);

my %names = $descr =~ / ( .+? ) \@\@ ( .+ ) /xg;

print Dumper \%names;
'
$VAR1 = {
'asdasdasd' => '\\main\\3',
'toodle' => '\\main\\9',
'boohoo' => '\\main\\6'
};

John

And it does! John's solution floats my boat. Thanks Alan, Tad, Michele.
This was my very first post here and I'm amazed by the response. Much
appreciated, a better way to learn and improve than just using a book.

C program: memory leak/ segmentation fault/ memory limit exceeded	0	Nov 12, 2022
Trouble with prediction code, for the life of me I can't figure out why it isnt running properly. Help would be appreciated.	0	Jul 8, 2023
Minimum Total Difficulty	0	Nov 15, 2023
Padding strings for a clean visual print out...	5	Dec 23, 2023
Algorithm for optimisation - using Graph.pm?	0	Jul 24, 2003
Taskcproblem calendar	4	Aug 31, 2023
Perl Newbie	2	Aug 27, 2008
Newbie...	2	Feb 24, 2011

Program optimisation advice for newbie

ChocMonk

Alan Mead

ChocMonk

John W. Krahn

Tad McClellan

Michele Dondi

Michele Dondi

ChocMonk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads