Loop Optimization, Array Alignment

Rajeev · Sep 16, 2004

Hello,

I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.

For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A*B*K; // all variables are float=4 bytes
}

Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?

Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part
of a much bigger array, but I can do it), but I don't know how to tell
the compiler that I have aligned these arrays. How do I do that ?

Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}

Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A*B; // all variables are float=4 bytes
}
R+=(RL*K);

Thanks in advance for any help,

-rajeev-

Mark A. Odell · Sep 16, 2004

(e-mail address removed) (Rajeev) wrote in

I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed
optimizations turned on.

For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A*B*K; // all variables are float=4 bytes
}

Q.1. Is there any advantage to having the arrays A,B,C aligned to 16
bytes ?

Might be but that's not a C issue, it's platform-specific and off-topic in
comp.lang.c.

Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is
part
of a much bigger array, but I can do it), but I don't know how to
tell the compiler that I have aligned these arrays. How do I do
that ?

Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}

Click to expand...

Shouldn't be but that's not a C issue, it's platform-specific and
off-topic in comp.lang.c.

Q.3. Will gcc take *K out of the loop ? (It may change the single
precision
computed result, eg if R starts off much bigger than the
contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A*B; // all variables are float=4 bytes
}
R+=(RL*K);

Click to expand...

This is a gcc question and off-topic in comp.lang.c

Dan Pop · Sep 16, 2004

In said:
I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.

If these details are relevant to your questions, cross-posting to
comp.lang.c was a gross mistake.

Dan

Paul Hsieh · Sep 16, 2004

I'm using gcc 3.4.2 on a Xeon (P4) platform, all kinds of speed optimizations
turned on.

For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A*B*K; // all variables are float=4 bytes
}

Q.1. Is there any advantage to having the arrays A,B,C aligned to 16 bytes ?

The Intel compiler might be assisted by such an alignment, because it
can use the packed SSE vector instructions to implement this
operation. I am not aware of any other x86 based compiler that can
automatically vectorize like this.

Q.1b. If yes, I can make them aligned (non-trivial since A[1]:A[N] is part
of a much bigger array, but I can do it), but I don't know how to tell
the compiler that I have aligned these arrays. How do I do that ?

Click to expand...

You're probably right, you can't. Even the Intel compiler relies on
deduction to know that an array or pointer is aligned. It will not be
able to deduce it from attempts to hack the array offset to fit the
alignment.

Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}

Click to expand...

No. If there is an advantage to doing it one way or another, the
compiler should be good enough to do the transformation from one form
to the other internally.

Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A*B; // all variables are float=4 bytes
}
R+=(RL*K);

Click to expand...

No. The compiler (regardless of which one) can't do this. This is
actually numerically different from your original loop. You need to
do this manually as shown here in order to leverage the operation
count reduction optimization. If the variables were integers, then in
theory a compiler could perform the optimization as you have done it.

pete · Sep 17, 2004

Rajeev wrote:

Q.2. Is there an advantage to using arrays or pointers, eg
float *pA=A,pB=B;
for (i=0;i<N;i++){
R+=(*pA++)*(*pB++)*K; // all variables are float=4 bytes
}

You can simplify the loop counting.

i = N;
while (i-- != 0) {
R += *pA++ * *pB++ * K;
}

Rajeev · Sep 17, 2004

[email protected] (Paul Hsieh) wrote in message news: said:
Q.3. Will gcc take *K out of the loop ? (It may change the single precision
computed result, eg if R starts off much bigger than the contribution.)

float RL=0;
for (i=0;i<N;i++){
RL+=A*B; // all variables are float=4 bytes
}
R+=(RL*K);

Click to expand...

No. The compiler (regardless of which one) can't do this. This is
actually numerically different from your original loop. You need to
do this manually as shown here in order to leverage the operation
count reduction optimization. If the variables were integers, then in
theory a compiler could perform the optimization as you have done it.

Paul and Pete,

Thank you both for your informative responses. Trying to do optimization
there's just so many things one can play with and try, it really helps a
non-expert like myself to get clarity on even a few issues, so I can focus
on others.

Regards,
-rajeev-

kal · Sep 18, 2004

pete said:
i = N;
while (i-- != 0) {
R += *pA++ * *pB++ * K;
}

Why not the following?

T = 0;
i = N;
while (i-- != 0) {
T += *pA++ * *pB++;
}
R += T * K;

pete · Sep 18, 2004

kal said:
Why not the following?

T = 0;
i = N;
while (i-- != 0) {
T += *pA++ * *pB++;
}
R += T * K;

That seems fine to me.
I'll restate the original conditions:
For the following loop
R=(evaluate here); // float
N=(evaluate here); // N min=1 max=100 median=66
for (i=0;i<N;i++){
R+=A*B*K; // all variables are float=4 bytes
}

Drawing missing in bitmap in a pure C win32 program	4	Jun 3, 2023
for loop skips items	13	Feb 15, 2012
Alignment, Cast	27	Aug 28, 2007
Alignment (K&R question)	13	Apr 23, 2010
Issue with textbox script?	0	Sep 5, 2022
Optimization question	10	Aug 18, 2007
Python Optimization	9	Feb 14, 2010
About use index and pointer address array	1	Sep 26, 2011

Loop Optimization, Array Alignment

Rajeev

Mark A. Odell

Dan Pop

Paul Hsieh

pete

Rajeev

kal

pete

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads