Limitations and workarounds to expressions defining size of an array

F

Francois Grieu

I'm trying to figure out the exact limitations that (possibly:
various versions of ISO/IEC 9899) C puts to expressions defining
the size of a plain array at compile time, and how to workaround
that.

An application would be, given EXPECTED, to define at compile
time an array with EXPECTED + 12*sqrt(EXPECTED) elements, or
some even more complex formula, as required by
http://stats.stackexchange.com/q/35658/8469


A tentative (not tested)

#define EXPECTED 15000

/* define an array of EXPECTED + 12*sqrt(EXPECTED)
pointers to struct foo (or slightly more) */
struct foo * bar[
#if (EXPECTED)<900
#error "unsupported EXPECTED value"
-1
#elif (EXPECTED)<3600
(EXPECTED)+360+((EXPECTED)-896)/5
#elif (EXPECTED)<14400
(EXPECTED)+720+((EXPECTED)-3591)/10
#else
(EXPECTED)+1440ul+((EXPECTED)-14381ul)/20
#endif
];

I think the above works for any EXPECTED>=900 up to
some large value like ULONG_MAX-ULONG_MAX/19 and then some,
and errs on the safe side, with at worse 5% wasted memory;
but this is
- unclear at best;
- painful to extend to lower bounds or less wasted memory;
- hard to generalize to more complex formula involving
for example log or exp.

Can I
- use floating point at all ?
- if yes: use sqrt, log, exp ?
- or at least, use intermediate values (other than
macro identifiers) to clarify things and remain below
the "4095 characters in a logical source line" limit?

I can think of enum as intermediates, and have done that
when computing a CRC at compile time, but they have rather
silly range limitations.


François Grieu
 
J

James Kuyper

I'm trying to figure out the exact limitations that (possibly:
various versions of ISO/IEC 9899) C puts to expressions defining
the size of a plain array at compile time, and how to workaround
that.


In C99:
6.7.5.2p1: "... the expression shall have an integer type. If the
expression is a constant expression, it shall have a value greater than
zero."
6.7.5.2p2: "... If an identifier is declared to be an object with static
storage duration, it shall not have a variable length array type."
6.7.5.2p4: "... If the size is an integer constant expression and the
element type has a known constant size, the array type is not a variable
length array type; otherwise, the array type is a variable length array
type."
6.7.5.2p5: "If the size is an expression that is not an integer constant
expression expression: if it occurs in a declaration at function
prototype scope, it is treated as if it were replaced by *; otherwise,
each time it is evaluated it shall have a value greater than zero."

C2011:
Change 6.7.5 => 6.7.6
Add "thread storage duration" as another context where VLAs are prohibited.
Implementation support for VLAs is now optional.

C90:
I don't have a copy of C90 to confirm the exact details of the wording,
but VLAs were added in C99, and were therefore not an option in C90.
An application would be, given EXPECTED, to define at compile
time an array with EXPECTED + 12*sqrt(EXPECTED) elements, or
some even more complex formula, as required by
http://stats.stackexchange.com/q/35658/8469


A tentative (not tested)

#define EXPECTED 15000

/* define an array of EXPECTED + 12*sqrt(EXPECTED)
pointers to struct foo (or slightly more) */
struct foo * bar[
#if (EXPECTED)<900
#error "unsupported EXPECTED value"
-1
#elif (EXPECTED)<3600
(EXPECTED)+360+((EXPECTED)-896)/5
#elif (EXPECTED)<14400
(EXPECTED)+720+((EXPECTED)-3591)/10
#else
(EXPECTED)+1440ul+((EXPECTED)-14381ul)/20
#endif
];

I think the above works for any EXPECTED>=900 up to
some large value like ULONG_MAX-ULONG_MAX/19 and then some,

Correct. All of your expressions have integer type and a constant value,
and except for the first one, have a positive value. Unless I missed
something, they should be acceptable in all versions of the standard.
and errs on the safe side, with at worse 5% wasted memory;
but this is
- unclear at best;
- painful to extend to lower bounds or less wasted memory;
- hard to generalize to more complex formula involving
for example log or exp.

Can I
- use floating point at all ?
- if yes: use sqrt, log, exp ?

In C90, "No" to both questions. For arrays with static or thread storage
duration, no. However, in C99, for arrays with automatic storage
duration, this is permitted, so long as the entire expression containing
those sub-expressions has an integer type and a positive value. In
C2011, it depends upon whether or not __STDC_NO_VLA__ is pre-#defined by
the implementation.
- or at least, use intermediate values (other than
macro identifiers) to clarify things and remain below
the "4095 characters in a logical source line" limit?

Whether or not that's permitted depends upon precisely what you mean by
that phrase - it's not clear from context.
 
B

Ben Bacarisse

Francois Grieu said:
I'm trying to figure out the exact limitations that (possibly:
various versions of ISO/IEC 9899) C puts to expressions defining
the size of a plain array at compile time, and how to workaround
that.

An application would be, given EXPECTED, to define at compile
time an array with EXPECTED + 12*sqrt(EXPECTED) elements, or
some even more complex formula, as required by
http://stats.stackexchange.com/q/35658/8469


A tentative (not tested)

#define EXPECTED 15000

/* define an array of EXPECTED + 12*sqrt(EXPECTED)
pointers to struct foo (or slightly more) */
struct foo * bar[
#if (EXPECTED)<900
#error "unsupported EXPECTED value"
-1
#elif (EXPECTED)<3600
(EXPECTED)+360+((EXPECTED)-896)/5
#elif (EXPECTED)<14400
(EXPECTED)+720+((EXPECTED)-3591)/10
#else
(EXPECTED)+1440ul+((EXPECTED)-14381ul)/20
#endif
];

I think the above works for any EXPECTED>=900 up to
some large value like ULONG_MAX-ULONG_MAX/19 and then some,
and errs on the safe side, with at worse 5% wasted memory;
but this is
- unclear at best;
- painful to extend to lower bounds or less wasted memory;
- hard to generalize to more complex formula involving
for example log or exp.

Can I
- use floating point at all ?

I don't think so, not in an integer constant expression.
- if yes: use sqrt, log, exp ?

You can't use functions calls, even in floating point constant
expressions!
- or at least, use intermediate values (other than
macro identifiers) to clarify things and remain below
the "4095 characters in a logical source line" limit?

I don't know about that.
I can think of enum as intermediates, and have done that
when computing a CRC at compile time, but they have rather
silly range limitations.

If it helps, I thought you might try this:

#define S0 (EXPECTED < 10000 ? 100 : (EXPECTED < 1000000) ? 1000 : 10000)
#define NR(s) (((s) + EXPECTED/(s))/2)
#define SQRT (NR(NR(NR(NR(S0)))))

It is, obviously, Newton-Raphson iteration (4 steps) and gets very close
estimates. For example, with EXPECTED=2147483647, SQRT=S3=46424 instead
of 46340. Add another branch to the conditional (to get a better start
estimate) or add another iteration and you get an exact answer.

You can replace the conditional expression with a set of #if's, or even
define S1, S2 etc to keep the expression size down though it's only
about 1500 characters as it is.
 
K

Kaz Kylheku

Can I
- use floating point at all ?
- if yes: use sqrt, log, exp ?

No. Constant expressions cannot contain funtion calls.

I would calculate the size at run-time (possibly caching it for re-use)
and then use malloc to grab the storage.

If I really needed a compile constant which requires a complex computation,
I would do that computation in a build configuration script (written in
some interpreted language which has floating-point support), and have it
spit out an include file.
 
B

BartC

Kaz Kylheku said:
No. Constant expressions cannot contain funtion calls.

Yet, an expression such as sqrt(9.0) seems to be reduced to 3.0 on the first
two compilers I tried.

It's not much of a stretch to allow that to be used as an array dimension,
the main stumbling block being that 3.0 is not integral, while adding an
(int) cast makes it look like a runtime expression.
 
B

BartC

Francois Grieu said:
I'm trying to figure out the exact limitations that (possibly:
various versions of ISO/IEC 9899) C puts to expressions defining
the size of a plain array at compile time, and how to workaround
that.

An application would be, given EXPECTED, to define at compile
time an array with EXPECTED + 12*sqrt(EXPECTED) elements, or
some even more complex formula, as required by
http://stats.stackexchange.com/q/35658/8469


A tentative (not tested)

#define EXPECTED 15000

/* define an array of EXPECTED + 12*sqrt(EXPECTED)

Perhaps try:

#define sqrt_EXPECTED 122

#define EXPECTED (sqrt_EXPECTED * sqrt_EXPECTED)

with the square root calculation done with a calculator. (You won't get
exactly EXPECTED=15000 though.)
 
E

Eric Sosman

Yet, an expression such as sqrt(9.0) seems to be reduced to 3.0 on the
first two compilers I tried.

Don't confuse definition with optimization. Also, try your
compilers on

strcspn("BartC", "abcdefghijklmnopqrsBCuvwxyz")

.... and let us know if they replace it with `3'.
It's not much of a stretch to allow that to be used as an array
dimension, the main stumbling block being that 3.0 is not integral,
while adding an (int) cast makes it look like a runtime expression.

The presence of an (int) cast does not disqualify an expression
from being an "integer constant expression." The function call,
however, does.
 
B

Barry Schwarz

I'm trying to figure out the exact limitations that (possibly:
various versions of ISO/IEC 9899) C puts to expressions defining
the size of a plain array at compile time, and how to workaround
that.

There have been questions posted here about applications that compile
and link cleanly but die during startup. Some of those relate to
arrays of excessive size.

For those systems that store automatic variables on the stack,
wouldn't this be a function of stack size? Wouldn't there be a system
specific way to adjust the stack size (if one exists)?

Wouldn't the limit be different if the array had static duration?
Don't different systems have different limits for the amount of memory
an application can claim (and also system specific methods for
changing that limit)?
 
J

James Kuyper

Don't confuse definition with optimization. Also, try your
compilers on

strcspn("BartC", "abcdefghijklmnopqrsBCuvwxyz")

... and let us know if they replace it with `3'.

That expression has a value of 0. Were you thinking of strspn()? My
compiler has no problems with the following:

#include <math.h>
#include <stdio.h>
#include <string.h>

struct foo{
int i;
};
int main(int argc, char *argv[])
{
struct foo *bar[(int)ceil(argc + 12.0*sqrt(argc))];
printf("bar: %zu\n", sizeof bar/ sizeof *bar);

int baz[strspn("BartC", "abcdefghijklmnopqrsBCuvwxyz")];
printf("baz: %zu\n", sizeof baz/ sizeof *baz);
return 0;
}

Which produces the output:

bar: 13
baz: 3

when invoked with no arguments, and

bar: 32
baz: 3

when invoked with four arguments. Does your compiler have any problems
with it?
 
A

army1987

Don't confuse definition with optimization. Also, try your
compilers on

strcspn("BartC", "abcdefghijklmnopqrsBCuvwxyz")

... and let us know if they replace it with `3'.

You meant strspn?
 
K

Kaz Kylheku

Yet, an expression such as sqrt(9.0) seems to be reduced to 3.0 on the first
two compilers I tried.

That has no bearing on what is constitutes "constant expression".
It's not much of a stretch to allow that to be used as an array dimension,
the main stumbling block being that 3.0 is not integral, while adding an
(int) cast makes it look like a runtime expression.

Why does a floating to integer conversion "look" runtime, but a square
root does not?
 
J

James Kuyper

BTW, I'm shocked by the result. It optimizes away the call to strspn but
it actually calls printf with arguments "%d\n" and 3? WTF?

The only printf() calls explicitly mentioned in this thread so far were
in my example program, which used "%zu", not "%d", so you're presumably
referring to some other program. What does the source code look like?
 
K

Kaz Kylheku

BTW, I'm shocked by the result. It optimizes away the call to strspn but
it actually calls printf with arguments "%d\n" and 3? WTF?

Optimizing printf format strings is harder to do than optimizing calls to
strspn.

To optimize strspn (in the above way), all you have to do is check that the two
arguments are both literal constants, and then call strspn inside the
compiler's execution environment, and substitute the result.

To optimize "%d\n" you have to parse the format string.

People implement whatever is easier. (Which is why we have numerous
concessions all over the C language in the first place.)
 
B

BartC

Kaz Kylheku said:
That has no bearing on what is constitutes "constant expression".

The sqrt() call is obviously being evaluated here as the well-known
mathematical operation rather than an arbitrary C function. (Unless the
compiler is clever enough to evaluate and reduce the C code inside an actual
sqrt() function somewhere.)

So the compiler could easily include mathematical functions amongst what are
considered constant expressions. That it doesn't do so is presumably the
standard saying it's not allowed (because it's not always practical to do).
That's unfortunate, because it would be handy.
Why does a floating to integer conversion "look" runtime, but a square
root does not?

From the error messages I got ('non-integer expression' with sqrt(), rather
than 'dynamic expression' with (int)sqrt(), although I can't replicate those
now..)
 
K

Kaz Kylheku

From the error messages I got ('non-integer expression' with sqrt(), rather
than 'dynamic expression' with (int)sqrt(), although I can't replicate those
now..)

That's just a consequence of the order in which some checks are applied.

Suppose that some context requires an integral, constant expression.

If you use an expression which is floating-point, and non-constant,
which error message will appear? A complaint that the expression isn't
integral, or that it's non-constant?

If the compiler stops checking an expression after the first constraint
is found, it's going to be whatever is checked first.
 
E

Eric Sosman

That expression has a value of 0. Were you thinking of strspn()?

Indeed, I was. Too much 'c', I guess. ;-)
[...] Does your compiler have any problems
with it?

BartC reports the compilers he tested implemented `sqrt(9.0)'
as `3.0', implying that the actual function call was not performed
at run-time. (He presumably used extra-linguistic means to find
this out.) He went on to suggest that this somehow put `sqrt(9.0)'
on an equal footing with a "constant expression." My example was
intended to show another function whose value is knowable before
run-time, hence similar to `sqrt(9.0)', but most likely not similarly
optimized. That is, I was trying to show the fallacy of reasoning
from observed behavior backward to the Standard, instead of from the
Standard forward to required behavior.

What I showed instead was carelessness, sigh.
 
K

Keith Thompson

BartC said:
Yet, an expression such as sqrt(9.0) seems to be reduced to 3.0 on the
first two compilers I tried.

Yes, I'm sure that's a common optimization.

But the phrase "constant expression" is defined by the standard in
terms of what it can contain; it doesn't merely mean "an expression
that a sufficiently clever compiler can evaluate at compile time".
It's not much of a stretch to allow that to be used as an array
dimension, the main stumbling block being that 3.0 is not integral,
while adding an (int) cast makes it look like a runtime expression.

But where do you draw the line between constant and non-constant
expressions? The authors of the standard already decided exactly
where to draw that line. Moving it in the direction of treating
more expressions as constant risks imposing requirements that some
compilers may not be able to meet.
 
K

Keith Thompson

Eric Sosman said:
That is, I was trying to show the fallacy of reasoning
from observed behavior backward to the Standard, instead of from the
Standard forward to required behavior.

What I showed instead was carelessness, sigh.

What you showed *in addition* was carelessness.
 
E

Eric Sosman

Yes, I'm sure that's a common optimization.

But the phrase "constant expression" is defined by the standard in
terms of what it can contain; it doesn't merely mean "an expression
that a sufficiently clever compiler can evaluate at compile time".


But where do you draw the line between constant and non-constant
expressions? The authors of the standard already decided exactly
where to draw that line. Moving it in the direction of treating
more expressions as constant risks imposing requirements that some
compilers may not be able to meet.

The Standard draws a line and says "All C compilers accept
everything on this side of the line as constants." But the line
is permeable: "An implementation may accept other forms of constant
expressions" (6.6p10). The upshot:

1) A C compiler must accept `static int x[2+1];', because
`2+1' is on the hither side of the line,

2) A C compiler may reject `static int x[strlen("abc")];'
because `strlen("abc")' is on the thither side, but

3) A C compiler may accept `static int x[strlen("abc")];'
under the "other forms" allowance, and need not even
issue a diagnostic.

Cases (2) and (3) illustrate why "It works with my compiler"
does not define the language.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top