Looking for strategies for performance improvements

lvirden · Jan 12, 2005

A perl developer stopped by this morning with this question.
He has a new trainee that is learning Perl. One of the early tasks
he has a need to complete is this.

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or whatever
hours have gone by...

What tools might be available to a newbie perl developer that would assist
in analysing and improving an application exhibiting this sort of
behavior?

Arndt Jonasson · Jan 12, 2005

A perl developer stopped by this morning with this question.
He has a new trainee that is learning Perl. One of the early tasks
he has a need to complete is this.

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or whatever
hours have gone by...

What tools might be available to a newbie perl developer that would assist
in analysing and improving an application exhibiting this sort of
behavior?

This sounds more like a bug than bad performance. Here is one idea: if
the process produces A, B, C, D, then start a second run (process 2)
in parallel with the first one, starting from point B. Then watch the
output, hoping that process 2 will produce something that process 1
didn't. If and when that happens, maybe you're only into some
megabytes of input data, not terabytes, and can use the debugger with
some chance of success.

I suppose the Perl code in question uses "use warnings" and "use strict"?

If what the code does is not too complex, one path may be to
reimplement it; then assuming that, even if buggy, the new
implementation will find the items that the old one misses, after which
you can use the debugger on the old one. Or toss the old one and keep
the new one...

(Looking for bugs when processing immense amounts of data has a charm
of its own. For example when you have log files but they are so big
that no editor will take them.)

xhoster · Jan 12, 2005

A perl developer stopped by this morning with this question.
He has a new trainee that is learning Perl. One of the early tasks
he has a need to complete is this.

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

What do you mean by "less frequently"? Do you mean it isn't finding all
the matches that it should, or do yo mean that it still finds what it
should but is getting slower and slower about doing so?

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or
whatever hours have gone by...

What tools might be available to a newbie perl developer that would
assist in analysing and improving an application exhibiting this sort of
behavior?

use strict;
use warnings;

And then, on unix-like systems, top and/or ps to see if the programs memory
use is growing without bound. If it is, go find the memory leak and fix
it.

Xho

Sherm Pendley · Jan 12, 2005

He has an application that reads through terabytes of information,
searching for particular patterns and combinations. The program
runs for days. However, as the time that the program runs progresses,
it appears to produce results less frequently than expected.

When he notices this, he stops the application, starts it back up where
things left off, and the application returns to producing results at
the frequency expected - until, of course, the next time 8 or 10 or
whatever hours have gone by...

What tools might be available to a newbie perl developer that would assist
in analysing and improving an application exhibiting this sort of
behavior?

I'd begin with analyzing the algorithms, not the code that implements them.
Nine times out of ten, choosing a better algorithm in a key spot will
result in a far better improvement in execution time than will any number
of code tweaks.

When it comes time to measure and profile the code, have a look at the
Devel:

Prof module. It's included with Perl, at least as of 5.8.1, which
is the oldest version I have on hand to check at the moment.

sherm--

Sherm Pendley · Jan 12, 2005

Sherm said:
I'd begin with analyzing the algorithms, not the code that implements
them. Nine times out of ten, choosing a better algorithm in a key spot
will result in a far better improvement in execution time than will any
number of code tweaks.

Talking to myself - never a good sign.

More specifically, based on your description it looks like there's a data
structure that grows as execution proceeds, and a method whose execution
time is a function of that data structure's size. If he can, he needs to
find a different algorithm to use for that method - ideally one that
executes in constant time.

For example, suppose he has an array to which elements are being added, and
he's doing a linear search into that array. He'd be better off using a hash
in that case instead of an array. A linear search is O(n), a function of
the number of elements in the array, whereas a hash lookup is a constant
O(1).

sherm--

xhoster · Jan 12, 2005

Sherm Pendley said:
Talking to myself - never a good sign.

More specifically, based on your description it looks like there's a data
structure that grows as execution proceeds, and a method whose execution
time is a function of that data structure's size.

Or a method whose execution time is a function of the VM system not
keeling over

If he can, he needs to
find a different algorithm to use for that method - ideally one that
executes in constant time.

I'd argue that this is likely an implementation problem, and not an
algorithm problem.

For example, suppose he has an array to which elements are being added,
and he's doing a linear search into that array. He'd be better off using
a hash in that case instead of an array. A linear search is O(n), a
function of the number of elements in the array, whereas a hash lookup is
a constant O(1).

But the OP indicated that the procedure can be restarted in the middle
(and presumably still give correct results). If you can dump the array
every now and then and still get correct results, then the algorithm never
needed you to keep those array elements in the first place. If your
implementation is keeping crap the algorithm doesn't need, that sounds like
an implementation problem rather than an algorithm problem.

Xho

Sherm Pendley · Jan 12, 2005

implementation is keeping crap the algorithm doesn't need, that sounds
like an implementation problem rather than an algorithm problem.

Well, I did say that algorithms would be the *first* thing I'd look at - not
the *only* thing.

I think of it this way - while a poor implementation can easily ruin a good
algorithm, the converse is not necessarily true. A good implementation will
rarely overcome a poor choice of algorithms. So I prefer to evaluate my
choice of algorithms first, and then once I'm satisfied with that, I look
at my implementation of those algorithms.

sherm--

performance problem	6	Nov 26, 2007
Startup Chile Company Looking For Founding Developer/CTO	1	Feb 1, 2012
Memory management strategies in C. (long)	7	Aug 20, 2003
Comparision of C Sharp and C performance	360	Dec 27, 2009
Looking for Ace Developer in Chicago	0	Apr 20, 2007
Web-based personal development co. looking for programmers	0	Mar 30, 2007
Web-based personal development co. looking for programmers	3	Mar 30, 2007
Web-based personal development co. looking for .NET programmers	0	Mar 30, 2007

Looking for strategies for performance improvements

lvirden

Arndt Jonasson

xhoster

Sherm Pendley

Sherm Pendley

xhoster

Sherm Pendley

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads