Perl Performance Tuning

U

Uri Guttman

o> We try to use strict and warnings in all modules, but I guess someone
o> should sit down and check where that is missing.

o> That option handling has long been on my sights for rewriting. But it
o> happens only once at startup (and again for old fashioned recursive
o> makefiles -- for which makepp has a far more elegant alternative), so
o> it's not a top priority.

and as i said, there are many ways to improve this code including
reasonable speedups and also general software quality.

uri
 
A

A. Sinan Unur

???????!!!!!!!!!!!

Yes. I've read one or two of your posts.

http://wordnet.princeton.edu/perl/webwn?s=arrogant

S: (adj) arrogant, chesty, self-important (having or showing feelings of
unwarranted importance out of overbearing pride)

The OP claimed that what he wrote should be required reading for all.
When I say the posting guidelines, or the Perl documentation should be
required reading, that is an opinion.

If others thought the OP's writings should be required reading, that
would have been praise.

The OP thinks his writings should be required reading; that is
arrogance.

Sinan
--
A. Sinan Unur <[email protected]>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://augustmail.com/~tadmc/clpmisc/clpmisc_guidelines.html
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was sent to
Anno Siegel
This is true for array- vs. hash access itself, but Uri was talking of
arrays vs. hashes as objects.

You are is probably right in this respect. On the other hand, the
fact that array access through a reference is almost as slow as hash
access through a reference should be also considered as a bug.

My timings on this particular machine:

$a[1] = 0.3 us, $a{1} = 0.57 us
$a->[1] = 0.67 us, $a->{1} = 0.68 us

Clearly, the same bug as with lexicals...

timeit perl -e "$b->[1]=4; $a=1000000; ($x,$x)=(1,$b->[1]) while --$a"
timeit perl -e "$b->[1]=4; $a=1000000; ($x,$x)=(1) while --$a"

(On quickier machines, and machines with less consistent program run
timing one may need much larger $a.)
Data access itself usually only accounts for a small part of an
object's performance.

Sure. But "usual" situations do not form overwhelming majority. I
would expect that code where the slot access dominates the time is not
that rare...
Switching from hashes to
arrays in a class implementation will make the class only slightly faster,
certainly not by a factor of 2.5.

Of course; the gain is diluted according to how much the slot access
dominates the running time. Moreover, the factor for raw slot access
depends on CPU architecture a lot; e.g., the value I measured today is
significantly less than 2.5.

Hope this helps,
Ilya
 
I

Ilya Zakharevich

[A complimentary Cc of this posting was NOT [per weedlist] sent to
Peter J. Holzer
That's an excellent reason to use bit flags. In C the smallest object
type is a char, which is 8 bits on most platforms. So if you
store each flag in a char, you waste only 7 bits per flag. In perl you
need an SV, which took about 40 bytes on average last time I looked.

Integer value takes 4 words with almost no malloc() overhead.

Hope this helps,
Ilya
 
O

occitan

Uri said:
o> This should be required reading for all Perl hackers, including those
o> who write the alas sometimes suboptimal documentation.

PJH> I agree with most of what you wrote but I have to comment on this:

PJH> That's an excellent reason to use bit flags. In C the smallest object
PJH> type is a char, which is 8 bits on most platforms. So if you
PJH> store each flag in a char, you waste only 7 bits per flag. In perl you
PJH> need an SV, which took about 40 bytes on average last time I looked.
PJH> Typically you store your flags in a hash, which adds more overhead, so
PJH> you may waste on the order of 1000 bits for each bit of information.
PJH> Not a problem if you have a few thousand flags, but if you have a few
PJH> million, it adds up.

my point is that needing large numbers of bit flags says that the design
is flawed. this doesn't cover the case where you are munging large
numbers of boolean flags for statistics or related work (and those apps
should use pdl or bit::vector). but for a reasonable sized set of flags,
it is better to use a simpler coding technique than to deal with the
trickiness of bit sized flags. using bits for a small number of flags is
micro-optimization and shouldn't be done unless the app is ridiculously
tight on space. and i bet even then there will be much better place to
save on storage than with the bits.

The size of the problem is not our choice. Huge complex build systems
(templates generating headers and embedded SQL, which generates C,
which generate objects, libs, programs and documentation) quickly adds
up to many thousands of files. For each one we have the choice of
calculating things repeatedly or remembering our findings and
decisions. Preferring speed in the old space vs. time dilemma, doesn't
imply a flawed design to me. Now when our interpreter gets huge as it
does, that's surely not primarily the booleans. But that doesn't mean
not to see if they can sensibly be compressed. It turned out not to be
the case, so that's that.

That doesn't mean our design is anywhere near perfect. In the world of
free software everybody is welcome to try and improve things. That's
how I got into makepp, having long suffered the attrocities of
recursive gmake. And then the boundless possibilities of make paired
with Perl wetted my appetite.

best regards
Daniel
 
P

Peter J. Holzer

Uri said:
my point is that needing large numbers of bit flags says that the design
is flawed.

Uh, that's like saying that needing a large number of objects says that
the design is flawed. A flag is nothing special. It's just a variable
that can take only two values.
this doesn't cover the case where you are munging large
numbers of boolean flags for statistics or related work (and those apps
should use pdl or bit::vector). but for a reasonable sized set of flags,

What is a reasonable sized set of flags?

Lets say you want to represent a Windows file system structure in memory
for some reason. A file has a name, a type (regular, directory, ...) and
an access control list. Each ACL is a list of entries which consist of
an entity (typically a user or group) and a number (13 with Win2K) of
permissions which may be allowed or denied.

So you could represent that as

{
name => 'foo/bar.txt',
type => 'f',
acl => {
[
{ user => 'hjp',
allow => {
read => 1,
write => 1,
append => 1,
list_perm => 1,
...
},
deny => {
},
},
{ group => 'users',
allow => {
read => 1,
list_perm => 1,
...
},
deny => {
take_ownership => 1,
...
},
},
],
}
}

Per file it isn't that much: Two hashes with at most 13 keys for each
entity which has access (which will typically be only a hand full). But
if you want to represent one million files, that's a lot of memory.

So you can store that in a much more compact form as:

{
name => 'foo/bar.txt',
type => 'f',
acl => {
[
{ user => 'hjp',
allow => 0x...,
deny => 0,
},
{ group => 'users',
allow => 0x...,
deny => 0x...,
},
],
}
}

Sure instead of
if ($ace->{allow}{read}) ...
you will have to write
if ($ace->{allow} & ACE_READ) ...
which may be less "perlish", but isn't much worse to read and probably
faster as well.
(Or you can make it an object with getter and setter methods and hide
the implementation)
it is better to use a simpler coding technique than to deal with the
trickiness of bit sized flags.

I don't find bit masks particularly tricky. But that's probably my C
background showing.
using bits for a small number of flags is micro-optimization and
shouldn't be done unless the app is ridiculously tight on space.

If you are talking about two orders of magnitude of overhead, it
doesn't take that much data to be ridiculously tight on space. My
workstation has only 512 megs of RAM :).

hp
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,181
Messages
2,570,970
Members
47,537
Latest member
BellCorone

Latest Threads

Top