Mystery: static variables & performance

M

Mark Shelor

I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical transform function.
When compiling under gcc on my big-endian PowerPC (Mac OS X),
declaring this array as "static" DECREASES the transform throughput by
around 5%. However, declaring it as "static" on gcc/Linux/Intel
INCREASES the throughput by almost 30%.

I would prefer that the array not be "static" so that the underlying C
function will be thread-safe. However, giving up close to 30%
performance on gcc/Linux/Intel is unacceptable for a digest routine,
whose value is often closely tied to speed.

Can anyone enlighten me on this mystery, and recommend a simple, clean,
portable way to assure good performance on all host types?

TIA, Mark
 
J

Jack Klein

I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical transform function.

Note that Perl is completely off-topic here, but your question is more
or less a basic C one, regardless of what code you are trying to write
when you make this comparison.
When compiling under gcc on my big-endian PowerPC (Mac OS X),
declaring this array as "static" DECREASES the transform throughput by
around 5%. However, declaring it as "static" on gcc/Linux/Intel
INCREASES the throughput by almost 30%.

The C standard does not define the relative performance of any type of
operation, variable, or memory usage over another. The differences
depend on two, and only two, things. They are the architecture of the
underlying hardware (processor, memory interface, etc.), and the
quality of the code generation of the compiler for that underlying
hardware.
I would prefer that the array not be "static" so that the underlying C
function will be thread-safe. However, giving up close to 30%
performance on gcc/Linux/Intel is unacceptable for a digest routine,
whose value is often closely tied to speed.

"thread-safe" is off-topic here, but if you need the array to have
automatic or dynamic allocation for some off-topic reason, then by all
means you had better write it that way and eschew arrays with static
storage duration.
Can anyone enlighten me on this mystery, and recommend a simple, clean,
portable way to assure good performance on all host types?

There is no mystery as far as we are concerned here, as the C standard
has neither requirements nor guarantees as to the relative performance
of memory with different duration, or even whether there is or is not
a difference. So you question is not in any way, shape, or form a
language one at all.

What you are asking for is how to optimize code for various
implementations, and that is not one but N implementation-specific
questions, where N is the number of platforms that you want it to run
on.

It is not even a C question, because processor hardware differs quite
widely. In maximally optimized assembly language, some processors
will access static memory faster than dynamic, others the opposite.
 
M

Mark Shelor

Jack said:
The C standard does not define ...


Your rather long-winded response could have been better expressed by
simply saying "I don't know."

My original post was not meant to elicit tautological remarks--such as
yours--about the C standard. It was intended to draw on a collective
depth of experience which this newsgroup hopefully possesses.

However, if comp.lang.c considers the practice of C programming to be
off-topic, and instead chooses to restrict itself to the language
definition, then please accept my apologies for intruding. I'll leave
you to your peaceful slumber.

Mark
 
J

Joona I Palaste

Your rather long-winded response could have been better expressed by
simply saying "I don't know."

But that would have been the wrong reply. This newsgroup is about the
C language, not about implementations of it. Languages do not have
speed. Implementations do. Therefore questions about efficiency are
off-topic here.
My original post was not meant to elicit tautological remarks--such as
yours--about the C standard. It was intended to draw on a collective
depth of experience which this newsgroup hopefully possesses.
However, if comp.lang.c considers the practice of C programming to be
off-topic, and instead chooses to restrict itself to the language
definition, then please accept my apologies for intruding. I'll leave
you to your peaceful slumber.

Practice of C programming comes in two levels: the C language itself,
and particular implementations of it. Measuring efficiency of
particular constructs falls under the latter category.
Jack said, quite correctly, that the C standard does not define
efficiency concerns. Honestly, would you think it did? Would it make
sense to define "operation X must be 2.5 times slower than operation Y,
or else the implementation does not conform to the C standard"?
 
M

Mark Shelor

Joona said:
But that would have been the wrong reply. This newsgroup is about the
C language, not about implementations of it. Languages do not have
speed. Implementations do. Therefore questions about efficiency are
off-topic here.


If you believe it's off-topic, then don't reply. This is an unmoderated
group.

Also, your presumption of clean separability between language definition
and efficiency is naive at best, and fatally flawed at worst. The
creation and refinement of the C language is rooted in the desire to
achieve efficient portable programs.

But, if you consider it more worthwhile to debate whether 73! can be
calculated in C without loss of significant digits--or other such toy or
academic problems--then far be it from me to interfere with your coffee
club.

Practice of C programming comes in two levels: the C language itself,
and particular implementations of it. Measuring efficiency of
particular constructs falls under the latter category.
Jack said, quite correctly, that the C standard does not define
efficiency concerns. Honestly, would you think it did? Would it make
sense to define "operation X must be 2.5 times slower than operation Y,
or else the implementation does not conform to the C standard"?


No, because the people who created the language--and refined its
definition into a standard--are reasonably intelligent, unlike your
example. Is

while ((*s++ = *t++) != '\0')
;

an efficient way to perform a string copy? Yes, probably more so than
other simple or brute-force approaches: a fact primarily due to C's
inclusion of facilities to specifically allow the production of
highly-efficient, portable programs. Divorcing considerations of
efficiency from the language definition, and labelling them as merely
implementation-dependent, does not reflect a great deal of wisdom or
practical experience.

Nonetheless, if you have something constructive to contribute, then
please feel free to enlighten me. Otherwise, I recommend you take
Wittgenstein's advice and simply remain silent.

Regards, Mark
 
S

Sidney Cadot

If you believe it's off-topic, then don't reply. This is an unmoderated
group.

The method of choice in this newsgroup with respect to off-topic
messages is to point it out to the sender, and kindly but firmly
redirect them to a more appropriate place.
Also, your presumption of clean separability between language definition
and efficiency is naive at best, and fatally flawed at worst. The
creation and refinement of the C language is rooted in the desire to
achieve efficient portable programs.

Well, let's see what the standard has to say about this, shall we:

C99, 5.1.2.3#1: "The semantic descriptions in this International
Standard describe the behavior of an abstract machine in which issues of
optimization are irrelevant. "

So the C language definition is explicitly separated from performance
issues. Of course, there are many practical situations (e.g., yours)
where performance /is/ important, but then, well, you need to address
this in a newsgroup dedicated to your particular implementation.
But, if you consider it more worthwhile to debate whether 73! can be
calculated in C without loss of significant digits--or other such toy or
academic problems--then far be it from me to interfere with your coffee
club.

Why, thank you. Although I'm not sure the derogatory tone is warranted.
Most of the regulars here are very experienced, with backgrounds varying
from academia to compiler-builders to working software engineers; most
of them would agree that following this newsgroup has provided them with
a great deal of insight into the C language.
No, because the people who created the language--and refined its
definition into a standard--are reasonably intelligent, unlike your
example. Is

while ((*s++ = *t++) != '\0')
;

an efficient way to perform a string copy? Yes, probably more so than
other simple or brute-force approaches: a fact primarily due to C's
inclusion of facilities to specifically allow the production of
highly-efficient, portable programs.

I'd suggest using "strcpy()", and trust the library implementors.
Divorcing considerations of
efficiency from the language definition, and labelling them as merely
implementation-dependent, does not reflect a great deal of wisdom or
practical experience.

On the contrary, it provides a very important factorization between two
important aspects (semantics versus performance), making it possible to
treat these as (more-or-less) separate problems. It's all just a matter
of proper design.

compl.lang.c is concerned with semantics of the C language as described
in the standards (C89, C99). For performance (a worthy subject in
itself), you are kindly redirected to our friendly neighbours.
Nonetheless, if you have something constructive to contribute, then
please feel free to enlighten me. Otherwise, I recommend you take
Wittgenstein's advice and simply remain silent.

A little bit of self-reflection on your side wouldn't hurt, I guess.

Best regards,

Sidney
 
P

pete

Mark said:
If you believe it's off-topic, then don't reply.
This is an unmoderated group.

That's why we all have the right to censor.
This newsgroup has the highest S/N ratio of any that I'm aware of.
Topicality is highly valued.
Otherwise, I recommend you take
Wittgenstein's advice and simply remain silent.

If you can't appreciate topicallity,
then we have to gang up on you. We have no other choice.
 
C

CBFalconer

Mark said:
Your rather long-winded response could have been better expressed
by simply saying "I don't know."

My original post was not meant to elicit tautological remarks--such as
yours--about the C standard. It was intended to draw on a collective
depth of experience which this newsgroup hopefully possesses.

You fail to realize that your original query has no answer, except
in the context of a specific C implementation. Most people
inhabiting a newsgroup dedicated to that implementation are likely
to know something about it. This group discusses portable coding
in the C language, as defined by the C standard. That makes
(accurate) advice applicable to all.

Even should someone here offer a reply there may well be nobody
available to check its accuracy. If you lurk a short time (as you
should have before posting) you will notice that inaccuracies tend
to be pointed out loudly and at length, sometimes accompanied by
gratuitious aspersions.

You should thank Jack for explaining reasons, rather than replying
with a simple "OT, go away".
 
M

Martin Ambuhl

Mark said:
If you believe it's off-topic, then don't reply. This is an unmoderated
group.

This approach has been tried and found wanting. It leads to newnet
cesspools. comp.lang.c and its pre-renaming predecessor is one of the
oldest and most successful newsgroup because of the attempt to keep it
topical. You are simply wrong to post off-topic questions here and wrong
in your suggestion of what to do about off-topic posts.
 
M

MSG

Mark Shelor said:
I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical transform function.
When compiling under gcc on my big-endian PowerPC (Mac OS X),
declaring this array as "static" DECREASES the transform throughput by
around 5%. However, declaring it as "static" on gcc/Linux/Intel
INCREASES the throughput by almost 30%.

I would prefer that the array not be "static" so that the underlying C
function will be thread-safe. However, giving up close to 30%
performance on gcc/Linux/Intel is unacceptable for a digest routine,
whose value is often closely tied to speed.

Can anyone enlighten me on this mystery, and recommend a simple, clean,
portable way to assure good performance on all host types?

TIA, Mark

You forgot to mention relevant details:

Threads library and version
Kernel version and configuration
CPU version
Compiler version and options used with them

As you noted above yourself, the performance difference between static
and regular arrays depends on these parameters. My guess is that it
comes from the different threading algorithms BSD and Linux use.

MSG
 
M

Mark Shelor

MSG said:
You forgot to mention relevant details:

Threads library and version
Kernel version and configuration
CPU version
Compiler version and options used with them

As you noted above yourself, the performance difference between static
and regular arrays depends on these parameters. My guess is that it
comes from the different threading algorithms BSD and Linux use.


At long last ... an intelligent response. Thank you!

Nonetheless, the problem is not quite as simple as your queries imply.
The Digest::SHA module and underlying C code are designed to be as
portable as reasonably possible, since it's not known beforehand which
compiler(s) will be used to build it; the CPAN user community is quite
broad and diverse. Merely finding which code is most efficient for each
platform/compiler combination, and setting up the appropriate #ifdef's,
is like chipping away at an iceberg. Yet, there should be a way to
construct portable and highly efficient code in a general sense (i.e.
that works well across most machines).

Your suggestion of threading is interesting, and might imply more
efficient alternatives to the current scheme of using function-local
automatic or static variables to hold the SHA message schedule. Perhaps
also is has to do with the register assignment algorithm working
differently when such variables change their storage class. It's hard
to be believe that accessing memory in one location versus another would
result in a 30% decrease in performance, unless of course the memory
values were no longer being cached in registers.

Or, perhaps there are other explanations for why the performance is
impacted, which might suggest other ways of approaching the computation.
That's what I'm really after here.

Regards, Mark
 
M

Mark Shelor

Sidney said:
The method of choice in this newsgroup with respect to off-topic
messages is to point it out to the sender, and kindly but firmly
redirect them to a more appropriate place.


Such as?

I'd suggest using "strcpy()", and trust the library implementors.


I'd suggest you re-read the entire paragraph surrounding my string-copy
example to gain the proper context. You entirely missed the point, and
your response is pedantic, insulting, and, lo, off-topic.

A little bit of self-reflection on your side wouldn't hurt, I guess.


Please, Sidney, set an example for us, and I'll be happy to follow.


Regards, Mark
 
M

Mark Shelor

pete said:
That's why we all have the right to censor.
This newsgroup has the highest S/N ratio of any that I'm aware of.


Given the lengthy discussion devoted to such absorbing topics as
"calculate value of 73!", I'd have to say that your criteria for
assessing a "signal" are quite liberal.

If you can't appreciate topicallity,
then we have to gang up on you. We have no other choice.


In the interest of maintaining newsgroup "purity" of course. You fail
to appreciate the fact that questions of portable efficiency ARE topical
to a discussion of the language. You are attempting to unilaterally
outlaw these questions because you lack the appropriate insight and
experience to answer them. Furthermore, accepting the role of mere
"standards-jockey" is flattering neither to yourself nor to the newsgroup.

Regards, Mark
 
D

Dan Henry

I've encountered a troublesome inconsistency in the C-language Perl
extension I've written for CPAN (Digest::SHA). The problem involves the
use of a static array within a performance-critical transform function.
When compiling under gcc on my big-endian PowerPC (Mac OS X),
declaring this array as "static" DECREASES the transform throughput by
around 5%. However, declaring it as "static" on gcc/Linux/Intel
INCREASES the throughput by almost 30%.

"static" as opposed to what, (implicit) "auto" or dynamically
allocated (i.e., malloc'ed)?

Specifying that the array's storage class be "static" /could/ cause it
to be located in a different region of memory compared to another
storage class or compared to the heap. By "different", I mean from a
hardware perspective -- different memory types with different access
times, different caching policies, different memory management
policies, etc.

Also, "static" allocation /could/ allow use of a different memory
addressing mode which might vary the speed.
I would prefer that the array not be "static" so that the underlying C
function will be thread-safe. However, giving up close to 30%
performance on gcc/Linux/Intel is unacceptable for a digest routine,
whose value is often closely tied to speed.

Can anyone enlighten me on this mystery,...

Not without a whole bunch of platform specific details, I don't think.
 
M

Mark McIntyre

If you believe it's off-topic, then don't reply. This is an unmoderated
group.

actually, its more of a self-regulated group. Offtopic posts get one
polite warning,
Also, your presumption of clean separability between language definition
and efficiency is naive at best, and fatally flawed at worst. The
creation and refinement of the C language is rooted in the desire to
achieve efficient portable programs.

Define efficient. fast? compact? least code? fewest pagefaults? etc
Jack said, quite correctly, that the C standard does not define

No, because the people who created the language--and refined its
definition into a standard--are reasonably intelligent, unlike your
example.

You miss the point entirely. The standard doesn't define these things,
nor the efficiency of your example, because its ntirely system
specific.
Nonetheless, if you have something constructive to contribute, then
please feel free to enlighten me. Otherwise, I recommend you take
Wittgenstein's advice and simply remain silent.

Round here, we ask offtopic posters politely to ask elsewhere. If you
do't like that, then please feel free to remain silent.
 
M

Mark McIntyre

You forgot to mention relevant details:
snip irrelevant details.

Troll alert: there's no such thing as threading in C. If you think
that threads are relevant you;re either a troll or posting in the
wrong grop.
 
M

Mark McIntyre

MSG wrote: (stuff)>

At long last ... an intelligent response. Thank you!

you might want to consider that MSG is bordering on becoming a troll
here.

(snip offtopic stuff)
Or, perhaps there are other explanations for why the performance is
impacted,

The frobozz has lost its wheezle. Or possibly your core memory beads
have slipped on the wires. Or you forgot to rub the cat with the
amber before putting the static memory in your 380-Z.
which might suggest other ways of approaching the computation.
That's what I'm really after here.

Then you're becoming an idiot. This is the wrong group.
 
C

Christian Bau

Mark Shelor said:
Nonetheless, the problem is not quite as simple as your queries imply.
The Digest::SHA module and underlying C code are designed to be as
portable as reasonably possible, since it's not known beforehand which
compiler(s) will be used to build it; the CPAN user community is quite
broad and diverse. Merely finding which code is most efficient for each
platform/compiler combination, and setting up the appropriate #ifdef's,
is like chipping away at an iceberg. Yet, there should be a way to
construct portable and highly efficient code in a general sense (i.e.
that works well across most machines).

One possibility that I have used for highly time critical and relatively
small code: Implement two or three versions of a function, each doing
things in different ways that could influence speed in some way. Call
these functions through a global variable containing a function pointer.
The function pointer is initialised to point to a measuring function;
that function will during the first x calls call one of your
implementations and measure the time; after x calls it decides which one
was fasted and replaces the function pointer with a pointer to the
fastest function which is used for the rest of program execution.
 
M

Mark McIntyre

You fail
to appreciate the fact that questions of portable efficiency ARE topical
to a discussion of the language.

And you have had an answer. . Just because you don't like the answer
doesn't mean its not true. But let me repeat, for the hard of hearing:
"Standard C doesn't require any particular efficiency from any
function or construct and so it ts impossible to answer your question.
You need to ask in a group specialising in your platforms."
You are attempting to unilaterally

FWIW, to be unilateral, there has to be only one person trying to do
it. There's a /consensus/ here about this.
outlaw these questions because you lack the appropriate insight and
experience to answer them.

ROFL !!!!! Do you know who you're referring to?
 
M

Mark Shelor

Mark said:
The frobozz has lost its wheezle. Or possibly your core memory beads
have slipped on the wires. Or you forgot to rub the cat with the
amber before putting the static memory in your 380-Z.


Hmm ... are those techniques mentioned in the C standards? Isn't your
Then you're becoming an idiot. This is the wrong group.


Your reasoning is spurious. This is the group that likes to have
extended discussions on whether it's possible to calculate 73! using C.
If I were an idiot, I'd feel perfectly at home here.

Regards, Mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,739
Latest member
Clint8040

Latest Threads

Top