Optimization with local vs. global arrays

J

Jim West

The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

There is considerable work being performed in the ... section.
This is on a Linux Itanium II system, compiled both with the Intel C++
compiler (V9.1) with interprocedural optimization enabled, and with the
GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
more dramatic with the Intel Compiler.) I tried declaring the local
FOO arrays static with

static FOO a[10], b[10], c[10];

which helped with the GNU compiler but was actually worse with the Intel
compiler. I also tried

FOO d[30];
FOO *a = d, *b = d+10, *c = d+20;

with a local d array, but that had no effect.

Is this just a compiler issue, or am I missing something? I want to avoid
the external arrays, obviously, but that code compiled by the Intel
compiler gives the fastest execution speed by far. I'd like to get the
equivalent performance with something less dangerous than global arrays.
 
G

GeekBoy

Jim West said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

There is considerable work being performed in the ... section.
This is on a Linux Itanium II system, compiled both with the Intel C++
compiler (V9.1) with interprocedural optimization enabled, and with the
GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
more dramatic with the Intel Compiler.) I tried declaring the local
FOO arrays static with

static FOO a[10], b[10], c[10];

which helped with the GNU compiler but was actually worse with the Intel
compiler. I also tried

FOO d[30];
FOO *a = d, *b = d+10, *c = d+20;

with a local d array, but that had no effect.

Is this just a compiler issue, or am I missing something? I want to avoid
the external arrays, obviously, but that code compiled by the Intel
compiler gives the fastest execution speed by far. I'd like to get the
equivalent performance with something less dangerous than global arrays.

Faster processor?
 
M

Mark P

Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

There is considerable work being performed in the ... section.
This is on a Linux Itanium II system, compiled both with the Intel C++
compiler (V9.1) with interprocedural optimization enabled, and with the
GNU C V 3.3.5 compiler with -O3 optimization. (The performance change is
more dramatic with the Intel Compiler.) I tried declaring the local
FOO arrays static with

static FOO a[10], b[10], c[10];

which helped with the GNU compiler but was actually worse with the Intel
compiler. I also tried

FOO d[30];
FOO *a = d, *b = d+10, *c = d+20;

with a local d array, but that had no effect.

Is this just a compiler issue, or am I missing something? I want to avoid
the external arrays, obviously, but that code compiled by the Intel
compiler gives the fastest execution speed by far. I'd like to get the
equivalent performance with something less dangerous than global arrays.

It's not especially surprising that the local arrays, which may be
pushed on the stack with each invocation of bar, would be slower than
the global arrays. If you want something "safer" you could try moving
the arrays to a namespace.

Mark
 
J

Jim West

Faster processor?

No, all are run on the same system, OS etc. It is compiled with
the Intel compiler using

icc -O3 -ip -c foo.cc

and with the GNU compiler using

g++ -O3 -c foo.cc
 
J

Jim West

It's not especially surprising that the local arrays, which may be
pushed on the stack with each invocation of bar, would be slower than
the global arrays. If you want something "safer" you could try moving
the arrays to a namespace.

OK, I had thought that the time needed to push the small arrays on the
stack (FOO isn't a very large class) would be small compared to the
heavy number crunching I do in the bar() routine. Guess not!

The namespace solution is what I needed, since some of the array names
are reused through-out the code. Seems obvious once it was pointed out.
:)

Thanks for the help.
 
I

Ian Collins

Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
What is a FOO?

Does it require construction?

Do you call bar() in a loop?
 
J

Jim West

Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
What is a FOO?

Does it require construction?

Do you call bar() in a loop?


FOO is actually a three-dimensional space vector:

class FOO {
float x, y, z;
FOO() : x_(0), y_(0), z_(0) { };
FOO(float x, float y, float z) : x_(x), y_(y), z_(z) { };
inline FOO& operator+=(const FOO& a);
/* Many more inline operators and member functions included */
};

bar() is called many times in a loop.
 
I

Ian Collins

Jim said:
Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

What is a FOO?

Does it require construction?

Do you call bar() in a loop?



FOO is actually a three-dimensional space vector:

class FOO {
float x, y, z;
FOO() : x_(0), y_(0), z_(0) { };
FOO(float x, float y, float z) : x_(x), y_(y), z_(z) { };
inline FOO& operator+=(const FOO& a);
/* Many more inline operators and member functions included */
};

bar() is called many times in a loop.

So there's your reason - FOO() gets called 30 times for each call of bar().
 
G

GeekBoy

Ian Collins said:
Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
What is a FOO?
Foobar is a universal variable understood to represent whatever is being
discussed.
It's usually used in examples that illustrate concepts and ideas in computer
science.


For instance, a computer science professor may be discussing different file
formats. In this case, he would call the generic-example file foo or foobar,
then list the extensions associated with the file formats (e.g. foobar.txt,
foobar.gif, foobar.exe, foobar.tar).

When foo or foobar is used, everyone understands that these are just
examples, and they don't really exist.


Programmers and administrators also use foo and foobar in a similar context.
Files or program s named with foo or foobar are understood not to be
permanent and will be changed or deleted at anytime.


Foo, bar, and the compound foobar were commonly used at MIT, Stanford and
the Helsinki University of Technology, Finland. Other generic variables are
used other places, but only these three are considered universal.
 
I

Ian Collins

GeekBoy said:
When foo or foobar is used, everyone understands that these are just
examples, and they don't really exist.
Not in this case, if you read the OP's reply.
Foo, bar, and the compound foobar were commonly used at MIT, Stanford and
the Helsinki University of Technology, Finland. Other generic variables are
used other places, but only these three are considered universal.
If you haven't done so already, research the origin of the term.*Please* don't quote signatures.
 
P

Pete Becker

Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
H

Howard

Pete Becker said:
Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

In another part of the thread, he shows two constructors, one default and
one parameterized. He can't remove the default constructor because the
array definitions require one, and the compiler can't create one if there's
a parameterized version, right? (Of course, he may be able to remove the
parameterized version as well; I don't know.) But won't the
compiler-generated constructor perform the same default initializations, and
still get called 30 times for each call to bar? Or am I misunderstanding
something here?

In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

-Howard
 
G

Greg Comeau

Pete Becker said:
Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}

If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

In another part of the thread, he shows two constructors, one default and
one parameterized. He can't remove the default constructor because the
array definitions require one, and the compiler can't create one if there's
a parameterized version, right? (Of course, he may be able to remove the
parameterized version as well; I don't know.) But won't the
compiler-generated constructor perform the same default initializations, and
still get called 30 times for each call to bar? Or am I misunderstanding
something here?

In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

That was added later in the thread. Pete's point was probably
that (even in light of any new info) that since he is worried
about speed that the global array's (even if tossed into a named
or named namespace) are zero-initialized first because they are static
before other intialization on them occurs.

BTW, this can be different from having static's inside the function.
Actually, a bit of this, including the timing of the zero'ing of the
gloabals, is up to the compiler [system].

Lastly to Jim, if this is utterly crucial, there is probably
some other solution possible even faster than the globals,
but we probably don't know enough about what you're doing
to offer that at this point.
 
P

Pete Becker

Howard said:
Pete Becker said:
Jim said:
The execution speed of the following code is dramatically faster if I
declare some arrays globally rather than locally. That is

FOO a[10], b[10], c[10];

void bar() {
...
}

runs much faster (up to 33%) than

void bar() {
FOO a[10], b[10], c[10];
...
}
If, as seems to be the case, the culprit is the constructor for FOO, you
can just remove it. Since the version with the global array apparently
works correctly, the rest of the code isn't relying on having the FOO
values initialized on entry into bar.

In another part of the thread, he shows two constructors, one default and
one parameterized. He can't remove the default constructor because the
array definitions require one, and the compiler can't create one if there's
a parameterized version, right?

Shrug. Write an empty default constructor, or refactor. I'm not
particularly interested in getting into implementation details. The
point is that the initialization apparently isn't actually required, so
doesn't belong in the class.

(Of course, he may be able to remove the
parameterized version as well; I don't know.) But won't the
compiler-generated constructor perform the same default initializations, and
still get called 30 times for each call to bar? Or am I misunderstanding
something here?

The compiler-generated constructor uses the default initializer for each
of the float fields, and that initializer does nothing. Any reasonable
compiler will generate no code.
In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

But that unnecessarily increases coupling because every caller now has
to provide scratch data for bar.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
P

Pete Becker

Greg said:
That was added later in the thread. Pete's point was probably
that (even in light of any new info) that since he is worried
about speed that the global array's (even if tossed into a named
or named namespace) are zero-initialized first because they are static
before other intialization on them occurs.

Not quite. My point was that the global arrays demonstrate that
initialization is irrelevant, because on calls to bar after the first
one they have whatever junk was left over from the previous call. If
that works, then skip the initialization entirely.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
H

Howard

In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

But that unnecessarily increases coupling because every caller now has to
provide scratch data for bar.

True. I hadn't thought of that. I guess without knowing more details of
how bar is called and what it's used for, it's tough to come up with the
"best" solution.

-Howard
 
J

Jim West

In any case, what I might do in his case is declare the arrays in the
function which calls bar, prior to the loop that [apparently] calls bar
repeatedly, and simply pass pointers to those arrays to bar.

But that unnecessarily increases coupling because every caller now has to
provide scratch data for bar.

True. I hadn't thought of that. I guess without knowing more details of
how bar is called and what it's used for, it's tough to come up with the
"best" solution.

Following up on my situation...I unfortunately can't post to newsgroups
at work, but I followed the discussions on google groups. While the FOO
class need not be initialized in this particular case, it is part of a
numerical library that my group has put together over the years, and any
changes to the default constructor would potentially break a lot of
code. Initializing all components to zero seemed like a Good Idea(tm)
when I first wrote it a long time ago, but in hindsight it was clearly a
mistake. In any event, I copied the header to this particular file and
changed FOO to FOO2 everywhere and changed the default constructor to

FOO2() { };

This did not give any improvement, surprisingly. (At least with the
Intel compiler...I forgot to try g++.)

Allocating the arrays in the calling routines would work, but bar() is
indeed called in several different places and would require more editing
than I want to do at this point.

So, my solution is to use the namespace idea that Mark P (I believe)
suggested (already implemented). I need to protect them because their
real names are

FOO Observation_Points[10], Source_Points[10], Basis_Functions[10];

names which get reused in a lot of different places throughout the code.
If I get in the habit of declaring things globally but not protected by
namespaces I'm afraid it would only be a matter of time before bar7()
calls bar8() and changes values unexpectedly.

Finally, I think that this change is going to be sufficient. This is a
computational electromagnetics code, and the complex math involved
within bar() is quite intensive (it performs a four-dimensional numerical
quadrature to yield an electromagnetic field), and should overwhelm
any more minor tweaking that is possible. (Famous last words...I thought
the same about the constructor!)

Thanks for the help everyone. I really appreciate it, and I have learned
quite a bit about the cost of constructors that seem really, really
simple on the surface!

- Jim
 
J

Jim West

code. Initializing all components to zero seemed like a Good Idea(tm)
when I first wrote it a long time ago, but in hindsight it was clearly a
mistake. In any event, I copied the header to this particular file and
changed FOO to FOO2 everywhere and changed the default constructor to

FOO2() { };

This did not give any improvement, surprisingly.

I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
really tired). 33.8 is only slightly slower than I get with the
global arrays. I'll have to re-consider changing the class default
constructor and see how many things really do break.

I think I'll go to bed now before I hurt myself.
 
P

Pete Becker

Jim said:
I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
really tired). 33.8 is only slightly slower than I get with the
global arrays. I'll have to re-consider changing the class default
constructor and see how many things really do break.

I think I'll go to bed now before I hurt myself.

Benchmarking sure is fun!

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
I

Ian Collins

Jim said:
I take that back. I mis-read 33.8 seconds as 38 seconds (I must be
really tired). 33.8 is only slightly slower than I get with the
global arrays. I'll have to re-consider changing the class default
constructor and see how many things really do break.
I suggest you investigate profilers for your platform/tools, I'm
guessing your platform is Linux, if so, Sun Studio has some excellent
profiling and analysis tools you could use.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,740
Latest member
JudsonFrie

Latest Threads

Top