Looking for strategies for performance improvements

L

lvirden

A perl developer stopped by this morning with this question.
He has a new trainee that is learning Perl. One of the early tasks
he has a need to complete is this.

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or whatever
hours have gone by...

What tools might be available to a newbie perl developer that would assist
in analysing and improving an application exhibiting this sort of
behavior?
 
A

Arndt Jonasson

A perl developer stopped by this morning with this question.
He has a new trainee that is learning Perl. One of the early tasks
he has a need to complete is this.

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or whatever
hours have gone by...

What tools might be available to a newbie perl developer that would assist
in analysing and improving an application exhibiting this sort of
behavior?

This sounds more like a bug than bad performance. Here is one idea: if
the process produces A, B, C, D, then start a second run (process 2)
in parallel with the first one, starting from point B. Then watch the
output, hoping that process 2 will produce something that process 1
didn't. If and when that happens, maybe you're only into some
megabytes of input data, not terabytes, and can use the debugger with
some chance of success.

I suppose the Perl code in question uses "use warnings" and "use strict"?

If what the code does is not too complex, one path may be to
reimplement it; then assuming that, even if buggy, the new
implementation will find the items that the old one misses, after which
you can use the debugger on the old one. Or toss the old one and keep
the new one...

(Looking for bugs when processing immense amounts of data has a charm
of its own. For example when you have log files but they are so big
that no editor will take them.)
 
X

xhoster

A perl developer stopped by this morning with this question.
He has a new trainee that is learning Perl. One of the early tasks
he has a need to complete is this.

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

What do you mean by "less frequently"? Do you mean it isn't finding all
the matches that it should, or do yo mean that it still finds what it
should but is getting slower and slower about doing so?
When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or
whatever hours have gone by...

What tools might be available to a newbie perl developer that would
assist in analysing and improving an application exhibiting this sort of
behavior?

use strict;
use warnings;

And then, on unix-like systems, top and/or ps to see if the programs memory
use is growing without bound. If it is, go find the memory leak and fix
it.

Xho
 
S

Sherm Pendley

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or
whatever hours have gone by...

What tools might be available to a newbie perl developer that would assist
in analysing and improving an application exhibiting this sort of
behavior?

I'd begin with analyzing the algorithms, not the code that implements them.
Nine times out of ten, choosing a better algorithm in a key spot will
result in a far better improvement in execution time than will any number
of code tweaks.

When it comes time to measure and profile the code, have a look at the
Devel::DProf module. It's included with Perl, at least as of 5.8.1, which
is the oldest version I have on hand to check at the moment.

sherm--
 
S

Sherm Pendley

Sherm said:
I'd begin with analyzing the algorithms, not the code that implements
them. Nine times out of ten, choosing a better algorithm in a key spot
will result in a far better improvement in execution time than will any
number of code tweaks.

Talking to myself - never a good sign. :)

More specifically, based on your description it looks like there's a data
structure that grows as execution proceeds, and a method whose execution
time is a function of that data structure's size. If he can, he needs to
find a different algorithm to use for that method - ideally one that
executes in constant time.

For example, suppose he has an array to which elements are being added, and
he's doing a linear search into that array. He'd be better off using a hash
in that case instead of an array. A linear search is O(n), a function of
the number of elements in the array, whereas a hash lookup is a constant
O(1).

sherm--
 
X

xhoster

Sherm Pendley said:
Talking to myself - never a good sign. :)

More specifically, based on your description it looks like there's a data
structure that grows as execution proceeds, and a method whose execution
time is a function of that data structure's size.

Or a method whose execution time is a function of the VM system not
keeling over :)
If he can, he needs to
find a different algorithm to use for that method - ideally one that
executes in constant time.

I'd argue that this is likely an implementation problem, and not an
algorithm problem.
For example, suppose he has an array to which elements are being added,
and he's doing a linear search into that array. He'd be better off using
a hash in that case instead of an array. A linear search is O(n), a
function of the number of elements in the array, whereas a hash lookup is
a constant O(1).

But the OP indicated that the procedure can be restarted in the middle
(and presumably still give correct results). If you can dump the array
every now and then and still get correct results, then the algorithm never
needed you to keep those array elements in the first place. If your
implementation is keeping crap the algorithm doesn't need, that sounds like
an implementation problem rather than an algorithm problem.

Xho
 
S

Sherm Pendley

implementation is keeping crap the algorithm doesn't need, that sounds
like an implementation problem rather than an algorithm problem.

Well, I did say that algorithms would be the *first* thing I'd look at - not
the *only* thing. :)

I think of it this way - while a poor implementation can easily ruin a good
algorithm, the converse is not necessarily true. A good implementation will
rarely overcome a poor choice of algorithms. So I prefer to evaluate my
choice of algorithms first, and then once I'm satisfied with that, I look
at my implementation of those algorithms.

sherm--
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,166
Messages
2,570,901
Members
47,442
Latest member
KevinLocki

Latest Threads

Top