Python Front-end to GCC

Philip Herron · Oct 22, 2013

On 22 October 2013 00:41, Steven D'Aprano

Are you suggesting that gcc is not a decent compiler?

No.

If "optimize away

Click to expand...

to the null program" is such an obvious thing to do, why doesn't the most

Click to expand...

popular C compiler in the [FOSS] world do it?

Click to expand...

It does if you pass the appropriate optimisation setting (as shown in

haypo's comment). I should have been clearer.

gcc compiles programs in two phases: compilation and linking.

Compilation creates the object files x.o and y.o from x.c and y.c.

Linking creates the output binary a.exe from x.o and y.o. The -O3

optimisation setting used in the blog post enables optimisation in the

compilation phase. However each .c file is compiled independently so

because the add() function is defined in x.c and called in y.c the

compiler is unable to inline it. It also can't remove it as dead code

because although it knows that the return value isn't used it doesn't

know if the call has side effects.

You might think it's silly that gcc can't optimise across source files

and if so you're right because actually it can if you enable link time

optimisation with the -flto flag as described by haypo. So if I do

that with the code from the blog post I get (using mingw gcc 4.7.2 on

Windows):

$ cat x.c

double add(double a, double b)

{

return a + b;

}

$ cat y.c

double add(double a, double b);

int main()

{

int i = 0;

double a = 0;

while (i < 1000000000) {

a += 1.0;

add(a, a);

i++;

}

}

$ gcc -O3 -flto x.c y.c

$ time ./a.exe

real 0m0.063s

user 0m0.015s

sys 0m0.000s

$ time ./a.exe # warm cache

real 0m0.016s

user 0m0.015s

sys 0m0.015s

So gcc can optimise this all the way to the null program which takes

15ms to run (that's 600 times faster than pypy).

Note that even if pypy could optimise it all the way to the null

program it would still be 10 times slower than C's null program:

$ touch null.py

$ time pypy null.py

real 0m0.188s

user 0m0.076s

sys 0m0.046s

$ time pypy null.py # warm cache

real 0m0.157s

user 0m0.060s

sys 0m0.030s

[...]

So the pypy version takes twice as long to run this. That's impressive
but it's not "faster than C".

Click to expand...

Click to expand...

(Actually if I enable -flts with that example the C version runs 6-7

times faster due to inlining.)

Nobody is saying that PyPy is *generally* capable of making any arbitrary

Click to expand...

piece of code run as fast as hand-written C code. You'll notice that the

Click to expand...

PyPy posts are described as *carefully crafted* examples.

Click to expand...

They are more than carefully crafted. They are useless and misleading.

It's reasonable to contrive of a simple CPU-intensive programming

problem for benchmarking. But the program should do *something* even

if it is contrived. Both programs here consist *entirely* of dead

code. Yes it's reasonable for the pypy devs to test things like this

during development. No it's not reasonable to showcase this as an

example of the potential for pypy to speed up any useful computation.

I believe that, realistically, PyPy has potential to bring Python into

Click to expand...

Java and .Net territories, namely to run typical benchmarks within an

Click to expand...

order of magnitude of C speeds on the same benchmarks. C is a very hard

Click to expand...

target to beat, because vanilla C code does *so little* compared to other

Click to expand...

languages: no garbage collection, no runtime dynamism, very little

Click to expand...

polymorphism. So benchmarking simple algorithms plays to C's strengths,

Click to expand...

while ignoring C's weaknesses.

Click to expand...

As I said I don't want to criticise PyPy. I've just started using it

and I it is impressive. However both of those blog posts are

misleading. Not only that but the authors must know exactly why they

are misleading. Because of that I will take any other claims with a

big pinch of salt in future.

Oscar

You sir deserve a medal! I think alot of people are taking these sorts of benchmarks completely out of context and its great to see such a well rounded statement.

I applaud you so much! I've been sort of banging my head against the wall to describe what you just did as succinctly as that and couldn't.

Steven D'Aprano · Oct 22, 2013

If you don't implement exec() and eval() then people won't be able to
use namedtuples, which are a common datatype factory.

Philip could always supply his own implementation of namedtuple that
doesn't use exec.

But either way, if he doesn't implement eval and exec, what he has is not
Python, but a subset of Python. Perhaps an interesting and useful subset.

Steven D'Aprano · Oct 22, 2013

Keep in mind that the post's author, Maciej Fijalkowski, is not a native
English speaker (to the best of my knowledge). You or I would probably
have called the post a *contrived* example, not a "carefully crafted one"
-- the meaning is the same, but the connotations are different.

Micro-benchmarks are mostly of theoretical interest, and contrived ones
even more so, but still of interest. One needs to be careful not to read
too much into them, but also not to read too little into them.

Are you suggesting that gcc is not a decent compiler?
No.

If "optimize away
to the null program" is such an obvious thing to do, why doesn't the
most popular C compiler in the [FOSS] world do it?

Click to expand...

It does if you pass the appropriate optimisation setting (as shown in
haypo's comment). I should have been clearer.

"C can do nothing 10 times faster than Python!" -- well, okay, but what
does that tell you about my long-running web server app? Benchmarks at
the best of time are only suggestive, benchmarks for null programs are
even less useful.

The very next comment after Haypo is an answer to his observation:

@haypo print the result so the loop don't get removed as dead
code. Besides, the problem is really the fact that's -flto is
unfair since python imports more resemble shared libraries
than statically-compiled files.

I'll be honest, I don't know enough C to really judge that claim, but I
have noticed that benchmarks rarely compare apples and oranges,
especially when C is involved. You can't eliminate all the differences
between the code being generated, or at least not easily, since different
languages have deep-seated differences in semantics that can't be
entirely eliminated. But you should at least make some effort to compare
code that does the same thing the same way.

Here's an example: responding to a benchmark showing a Haskell compiler
generating faster code than a C compiler, somebody re-wrote the C code
and got the opposite result:

http://jacquesmattheij.com/when-haskell-is-not-faster-than-c

Again, I can't judge the validity of all of the changes he made, but one
stood out like a sore thumb:

C does not require you to set static global arrays to â€˜0â€™, so the
for loop in the main function can go...

Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So
the C code is now doing less work. Yes, your C compiler will allow you to
avoid zeroing memory before using it, and you'll save some time
initially. But eventually[1] you will need to fix the security
vulnerability by adding code to zero the memory, exactly as Haskell and
other more secure languages already do. So *not* zeroing the memory is
cheating. It's not something you'd do in real code, not if you care about
security and correctness. Even if you don't care about security, you
should care about benchmarking both languages performing the same amount
of work.

Now, I may be completely off-base here. Some Haskell expert may chime up
to say that Haskell does not, in fact, zero memory. But it does
*something*, I'm sure, perhaps it tracks what memory is undefined and
prevents reads from it, or something. Whatever it does, if it does it at
runtime, the C benchmark better do the same thing, or it's an unfair
comparison:

"Safely drive to the mall obeying all speed limits and traffic signals in
a Chevy Volt, versus speed down the road running red lights and stop
signs in a Ford Taurus" -- would it be any surprise that the Taurus is
faster?

[...]

They are more than carefully crafted. They are useless and misleading.
It's reasonable to contrive of a simple CPU-intensive programming
problem for benchmarking. But the program should do *something* even if
it is contrived. Both programs here consist *entirely* of dead code.

Click to expand...

But since the dead code is *not* eliminated, it is actually executed. If
it's executed, it's not really dead, is it? Does it really matter that
you don't do anything with the result? I'm with Maciej on this one --
*executing* the code given is faster in PyPy than in C, at least for this
C compiler. Maybe C is faster to not execute it. Is that really an
interesting benchmark? "C does nothing ten times faster than PyPy does
something!"

Given a sufficiently advanced static analyser, PyPy could probably
special-case programs that do nothing. Then you're in a race to compare
the speed at which the PyPy runtime environment can start up and do
nothing, versus a stand-alone executable that has to start up and do
nothing. If this is a benchmark that people care about, I suggest they
need to get out more

Ultimately, this is an argument as what counts as a fair apples-to-apples
comparison, and what doesn't. Some people consider that for a fair test,
the code has to actually be executed. If you optimize away code and don't
execute it, that's not a good benchmark. I agree with them. You don't. I
can see both sides of the argument, and think that they both have
validity, but on balance agree with the PyPy guys here: a compiler that
optimizes away "for i = 1 to 1000: pass" to do-nothing is useful, but if
you wanted to find out the runtime cost of a for-loop, you would surely
prefer to disable that optimization and time how long it takes the for
loop to actually run.

The actual point that the PyPy developers keep making is that a JIT
compiler can use runtime information to perform optimizations which a
static compiler like gcc cannot, and I haven't seen anyone dispute that
point. More in the comments here:

The point here is not that the Python implementation of
formatting is better than the C standard library, but that
dynamic optimisation can make a big difference. The first
time the formatting operator is called its format string is
parsed and assembly code for assembling the output generated.
The next 999999 times that assembly code is used without
doing the parsing step. Even if sprintf were defined locally,
a static compiler canâ€™t optimise away the parsing step, so
that work is done redundantly every time around the loop.

http://morepypy.blogspot.com/2011/0...howComment=1312357475889#c6708170690935286644

Also possibly of interest:

http://beza1e1.tuxen.de/articles/faster_than_C.html

[1] Probably not until after the Zero Day exploit is released.

Click to expand...

Click to expand...

Chris Angelico · Oct 22, 2013

Given a sufficiently advanced static analyser, PyPy could probably
special-case programs that do nothing. Then you're in a race to compare
the speed at which the PyPy runtime environment can start up and do
nothing, versus a stand-alone executable that has to start up and do
nothing. If this is a benchmark that people care about, I suggest they
need to get out more

Like every benchmark, it has its uses. Just last week I was tinkering
with several high level languages in order to see which one would
start, do a fairly trivial task, and shut down, in the shortest space
of time. Why? Because I wanted that trivial task to be added to our
rapid-deployment sequence at a point where a human would be waiting on
it, and the task itself wasn't critical (it was an early-catcher for a
particular type of bug). Delaying my fellow developers by even one
second at that point would be unacceptable; the various options at my
disposal took anywhere from 250 to 750 ms (cold cache, which is what
matters) to run. Yes, I know a second isn't long. But I was trying to
sell a concept, and if I can say that it adds "practically no time" to
an interactive action, that's a lot better than even one second.
Considering that rapiding took about 1200ms (ish - again, cold cache)
previously, adding even just 250ms is noticeable. Benchmarking an
empty program would get very close to this actual real-world scenario.

(Eventually I merged the functionality into an unrelated script just
for the sake of saving on interpreter startup time.)

ChrisA

Dave Angel · Oct 22, 2013

On Tue, 22 Oct 2013 10:14:16 +0100, Oscar Benjamin wrote:

Here's an example: responding to a benchmark showing a Haskell compiler
generating faster code than a C compiler, somebody re-wrote the C code
and got the opposite result:

http://jacquesmattheij.com/when-haskell-is-not-faster-than-c

Again, I can't judge the validity of all of the changes he made, but one
stood out like a sore thumb:

C does not require you to set static global arrays to ¡0¢, so the
for loop in the main function can go...

Wait a minute... Haskell, I'm pretty sure, zeroes memory. C doesn't. So

Click to expand...

Static int variables are in fact zeroed. However, most C compilers do
it by putting four bytes (or whatever) into the image of the
executable so it has no runtime cost.

But eventually[1] you will need to fix the security
vulnerability by adding code to zero the memory, exactly as Haskell and
other more secure languages already do. So *not* zeroing the memory is
cheating. It's not something you'd do in real code, not if you care
about security and correctness.

Click to expand...

I agree with most of what you say in the message, but here you go on to
say the C code is unsafely skipping initialization, which is not the
case.

By the way, a C compiler typically handles any initialization of a
static variable the same way. So if you declare and initialize a static
variable as

int myvar = 34 * 768;

it'll put the product directly in the executable image, and no runtime
code is generated.

Perhaps you were thinking of an automatic variable, which is not
initialized unless the programmer says so, and is then initialized with
code.

Oscar Benjamin · Oct 22, 2013

Are you suggesting that gcc is not a decent compiler?
No.

If "optimize away
to the null program" is such an obvious thing to do, why doesn't the
most popular C compiler in the [FOSS] world do it?

Click to expand...

It does if you pass the appropriate optimisation setting (as shown in
haypo's comment). I should have been clearer.

Click to expand...

"C can do nothing 10 times faster than Python!" -- well, okay, but what
does that tell you about my long-running web server app? Benchmarks at
the best of time are only suggestive, benchmarks for null programs are
even less useful.

This is precisely my point. They should show a benchmark that is not
semantically equivalent to the null program. I modified their example
to do that so that it wasn't simply a case of removing dead code and
then found that the C version performed 6-7 times faster than the PyPy
version. Had they simply stated that I would have been impressed.

At the bottom of this post I show a much better benchmark that shows
how PyPy can come very close to C performance for intensive floating
point computation. Note that although it is simple the benchmark
actually produces a result and none of the computation can be skipped
as dead code (why would I put that in?). Also note that both the C
binary and the script produce exactly the same numeric output. It's
also a simple example of numerical integration - something that is
often a bottleneck in scientific computation.

For the benchmark I find that the gcc -O3 binary runs in 4.6s and PyPy
runs the script in 6.9s (CPython 2.7 takes 600 seconds). That is
impressive and makes me think that there may be no need for me to use
C for things like that. To be sure I'd have to scale it up a bit to
see what happens when I break it apart into many functions and use
lists in PyPy vs arrays in C.

[...]

They are more than carefully crafted. They are useless and misleading.
It's reasonable to contrive of a simple CPU-intensive programming
problem for benchmarking. But the program should do *something* even if
it is contrived. Both programs here consist *entirely* of dead code.

Click to expand...

But since the dead code is *not* eliminated, it is actually executed. If
it's executed, it's not really dead, is it? Does it really matter that
you don't do anything with the result? I'm with Maciej on this one --
*executing* the code given is faster in PyPy than in C, at least for this
C compiler. Maybe C is faster to not execute it. Is that really an
interesting benchmark? "C does nothing ten times faster than PyPy does
something!"

I don't think it is reasonable to compare those things and I was
joking when I said that the optimised C version was 600 times faster
because this is an absurd benchmark - see the much better one below.

Given a sufficiently advanced static analyser, PyPy could probably
special-case programs that do nothing. Then you're in a race to compare
the speed at which the PyPy runtime environment can start up and do
nothing, versus a stand-alone executable that has to start up and do
nothing. If this is a benchmark that people care about, I suggest they
need to get out more

Like Chris I have also had situations where startup time mattered and
it can vary substantially between different interpreters and binaries.
I have GNU Octave on Windows and it literally takes 20 seconds to
start up. Matlab is worse: it takes about 1 minute so I don't tend to
use it for CLI scripts much.

Oscar

The benchmark:

$ cat euler.py
#!/usr/bin/env pypy

import math

def main():
x = 1
v = 0
t = 0
T = 1
dt = 2**-30
while t < T:
dxdt = v
dvdt = - x
x += dt * dxdt
v += dt * dvdt
t += dt
print('t = %.2e' % t)
print('x = %.2e' % x)
print('v = %.2e' % v)
print('x_err = %.e' % (x - math.cos(t)))
print('x_err = %.2e' % (v + math.sin(t)))

main()
$ time pypy euler.py
t = 1.00e+00
x = 5.40e-01
v = -8.41e-01
x_err = 3e-10
x_err = -3.92e-10

real 0m6.907s
user 0m0.076s
sys 0m0.045s
$ cat euler.c
#include <stdio.h>
#include <math.h>

int main()
{
double x = 1;
double v = 0;
double t = 0;
double T = 1;
double dt = pow(2, -30);
double dxdt, dvdt;
while (t < T)
{
dxdt = v;
dvdt = - x;
x += dt * dxdt;
v += dt * dvdt;
t += dt;
}
printf("t = %.2e\n", t);
printf("x = %.2e\n", x);
printf("v = %.2e\n", v);
printf("x_err = %.e\n", x - cos(t));
printf("x_err = %.2e\n", v + sin(t));

return 0;
}
$ gcc -O3 euler.c
$ time ./a.exe
t = 1.00e+000
x = 5.40e-001
v = -8.41e-001
x_err = 3e-010
x_err = -3.92e-010

real 0m4.609s
user 0m0.015s
sys 0m0.000s

$ time python euler.py # CPython 2.7
t = 1.00e+00
x = 5.40e-01
v = -8.41e-01
x_err = 3e-10
x_err = -3.92e-10

real 9m51.818s
user 0m0.015s
sys 0m0.015s

Mark Janssen · Oct 22, 2013

I love it. Watch this...

[context]
[Dave Angel responds:]

Did you even read the paragraph you quoted above? The BNF specification
does NOT completely describe a language, it only defines its syntax.

[Steven D'Aprano responds:]

Like every other language, C programs are certainly not *just* syntax.
Here is some syntax:

&foo bar^ :=

Now, I don't know where y'all were taught Computer Science, but BNF
specifies not only syntax (which would be the *tokens* of a language),
but also its *grammar*; how syntax relates to linguistic categories
like keywords, and tokens relate to each other.

Dave is claiming that BNF only defines the syntax of a language, but
then Stephen goes on to supply some syntax that a BNF specification of
the language would not allow (even though Steven calls it "syntax"
which is what BNF in Dave's claim parses).

So which of you is confused? I ask that in the inclusive (not
exclusive OR) sense.... ;^) <-- face says "both".

Mark Janssen
Tacoma, Washington.

Steven D'Aprano · Oct 22, 2013

On Tue, 22 Oct 2013 14:04:57 +0000, Dave Angel wrote:

[...]

I agree with most of what you say in the message,

Glad to hear I wasn't completely full of it. As a non-C developer, I'm
very conscious that a lot of what I know about C is second hand.

but here you go on to
say the C code is unsafely skipping initialization, which is not the
case.

Are you talking generically, or specifically about the C code referenced
in the link I gave?

"Memory is always zeroed" is one of the advantages of Go over C and C++,
at least according to Rob Pike:

http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-
more.html

By the way, a C compiler typically handles any initialization of a
static variable the same way. So if you declare and initialize a static
variable as

int myvar = 34 * 768;

it'll put the product directly in the executable image, and no runtime
code is generated.

Yes, that's a keyhole optimization, CPython does the same sort of thing:

py> import dis
py> dis.dis(compile("myvar = 34 * 768", "", "exec"))
1 0 LOAD_CONST 3 (26112)
3 STORE_NAME 0 (myvar)
6 LOAD_CONST 2 (None)
9 RETURN_VALUE

Perhaps you were thinking of an automatic variable, which is not
initialized unless the programmer says so, and is then initialized with
code.

No, I was thinking of an array. Arrays aren't automatically initialised
in C.

Grant Edwards · Oct 22, 2013

On 22/10/2013 08:00, Steven D'Aprano wrote:

Static int variables are in fact zeroed. However, most C compilers
do it by putting four bytes (or whatever) into the image of the
executable so it has no runtime cost.

Click to expand...

No, that's not how gcc works (nor is it how any other C compiler I've
ever seen works). Static variables get located in a "bss" section[1],
which is zeroed out at run-time by startup code that gets executed
before main() is called. The ELF executable contains headers that
describe the size/location of bss section, but the object file
contains no actual _data_.

[1] IIRC, the name "bss" is a historical hold-over from the PDP-11
assembler directive that is used to declare a section of memory
that is to be filled with zeros. Not all compilers use that
section name, but they all use the same mechanism.

int myvar = 34 * 768;

it'll put the product directly in the executable image, and no
runtime code is generated.

Click to expand...

That is true.

Grant Edwards · Oct 22, 2013

On Tue, 22 Oct 2013 14:04:57 +0000, Dave Angel wrote:

[...]

I agree with most of what you say in the message,

Click to expand...

Glad to hear I wasn't completely full of it. As a non-C developer, I'm
very conscious that a lot of what I know about C is second hand.

but here you go on to
say the C code is unsafely skipping initialization, which is not the
case.

Click to expand...

Are you talking generically, or specifically about the C code
referenced in the link I gave?

In C, static/global variables are always zeroed.

"Memory is always zeroed" is one of the advantages of Go over C and C++,
at least according to Rob Pike:

http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-more.html

Perhaps he's talking about automatic variables or malloc()ed memory?

You'd have to ask him.

No, I was thinking of an array. Arrays aren't automatically initialised
in C.

If they are static or global, then _yes_they_are_. They are zeroed.

Ned Batchelder · Oct 22, 2013

I love it. Watch this...

[context]

A language specification in BNF is just syntax. It doesn't say anything
about semantics. So how could this be used to produce executable C code
for a program? BNF is used to produce parsers. But a parser isn't
sufficient.
A C program is just syntax also. How does the compiler generate
executable machine code? Extrapolate into a Python front-end to C.

Click to expand...

[Dave Angel responds:]
Did you even read the paragraph you quoted above? The BNF specification
does NOT completely describe a language, it only defines its syntax. [Steven D'Aprano responds:]
Like every other language, C programs are certainly not *just* syntax.
Here is some syntax:

&foo bar^ :=

Click to expand...

Now, I don't know where y'all were taught Computer Science, but BNF
specifies not only syntax (which would be the *tokens* of a language),
but also its *grammar*; how syntax relates to linguistic categories
like keywords, and tokens relate to each other.

Mark, you had expressed interest in "an app that will take a language
specification in BNF (complete with keywords and all) and output C code
which is then compiled to an executable". I'm interested in how that
app might work.

Here's a BNF for a (very!) simple language:

<program> ::= <number> <op> <number>
<op> ::= "*!?" | "--+" | "..:"

That means these are three valid programs:

123 *!? 456
2 --+ 2
1001 ..: 4

What will the app output as C code for each of these?

Dave is claiming that BNF only defines the syntax of a language, but
then Stephen goes on to supply some syntax that a BNF specification of
the language would not allow (even though Steven calls it "syntax"
which is what BNF in Dave's claim parses).

So which of you is confused? I ask that in the inclusive (not
exclusive OR) sense.... ;^) <-- face says "both".

Could you please be less snarky? We're trying to communicate here, and
it is not at all clear yet who is confused and who is not. If you are
interested in discussing technical topics, then discuss them.

--Ned.

Antoine Pitrou · Oct 22, 2013

Steven D'Aprano said:
Philip could always supply his own implementation of namedtuple that
doesn't use exec.

But either way, if he doesn't implement eval and exec, what he has is not
Python, but a subset of Python. Perhaps an interesting and useful subset.

If you go that way, we already have Cython (which is both a subset and
superset of Python, although I don't know if it's still a strict subset these
days).

Regards

Antoine.

Benjamin Kaplan · Oct 22, 2013

I love it. Watch this...

[context]
[Dave Angel responds:]

Did you even read the paragraph you quoted above? The BNF specification
does NOT completely describe a language, it only defines its syntax.

Click to expand...

[Steven D'Aprano responds:]

Like every other language, C programs are certainly not *just* syntax.
Here is some syntax:

&foo bar^ :=

Click to expand...

Now, I don't know where y'all were taught Computer Science, but BNF
specifies not only syntax (which would be the *tokens* of a language),
but also its *grammar*; how syntax relates to linguistic categories
like keywords, and tokens relate to each other.

Dave is claiming that BNF only defines the syntax of a language, but
then Stephen goes on to supply some syntax that a BNF specification of
the language would not allow (even though Steven calls it "syntax"
which is what BNF in Dave's claim parses).

So which of you is confused? I ask that in the inclusive (not
exclusive OR) sense.... ;^) <-- face says "both".

I don't know where you were taught English, but syntax is " the way in
which linguistic elements (as words) are put together to form
constituents (as phrases or clauses) ", not the set of valid words
(tokens) in a language. A grammar, such as those grammars written in
BNF, describe the rules for the syntax of a language. And, as Steven
said, it still doesn't describe the semantics of a language, which is
the set of instructions described by the syntax.

Steven D'Aprano · Oct 22, 2013

If they are static or global, then _yes_they_are_. They are zeroed.

Not that I don't believe you, but do you have a reference for this?
Because I keep finding references to uninitialised C arrays filled with
garbage if you don't initialise them.

Wait... hang on a second...

/fires up the ol' trusty gcc

[steve@ando c]$ cat array_init.c
#include<stdio.h>

int main()
{
int i;
int arr[10];
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr);
}
printf("\n");
return 0;
}

[steve@ando c]$ gcc array_init.c
[steve@ando c]$ ./a.out
arr[0] = -1082002360
arr[1] = 134513317
arr[2] = 2527220
arr[3] = 2519564
arr[4] = -1082002312
arr[5] = 134513753
arr[6] = 1294213
arr[7] = -1082002164
arr[8] = -1082002312
arr[9] = 2527220

What am I missing here?

Mark Lawrence · Oct 22, 2013

If they are static or global, then _yes_they_are_. They are zeroed.

Click to expand...

Not that I don't believe you, but do you have a reference for this?
Because I keep finding references to uninitialised C arrays filled with
garbage if you don't initialise them.

Wait... hang on a second...

/fires up the ol' trusty gcc

[steve@ando c]$ cat array_init.c
#include<stdio.h>

int main()
{
int i;
int arr[10];
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr);
}
printf("\n");
return 0;
}

[steve@ando c]$ gcc array_init.c
[steve@ando c]$ ./a.out
arr[0] = -1082002360
arr[1] = 134513317
arr[2] = 2527220
arr[3] = 2519564
arr[4] = -1082002312
arr[5] = 134513753
arr[6] = 1294213
arr[7] = -1082002164
arr[8] = -1082002312
arr[9] = 2527220

What am I missing here?

arr is local to main, not static or global.

Chris Kaynor · Oct 22, 2013

On Tue, Oct 22, 2013 at 9:40 AM, Steven D'Aprano <

If they are static or global, then _yes_they_are_. They are zeroed.

Click to expand...

Not that I don't believe you, but do you have a reference for this?
Because I keep finding references to uninitialised C arrays filled with
garbage if you don't initialise them.

Wait... hang on a second...

/fires up the ol' trusty gcc

[steve@ando c]$ cat array_init.c
#include<stdio.h>

int main()
{
int i;
int arr[10];
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr);
}
printf("\n");
return 0;
}

[steve@ando c]$ gcc array_init.c
[steve@ando c]$ ./a.out
arr[0] = -1082002360
arr[1] = 134513317
arr[2] = 2527220
arr[3] = 2519564
arr[4] = -1082002312
arr[5] = 134513753
arr[6] = 1294213
arr[7] = -1082002164
arr[8] = -1082002312
arr[9] = 2527220

What am I missing here?

Click to expand...

The array you made there is an auto variable (stack), not a static or a
global. Try one of the following (neither has been tested):

Static:

int main()
{
int i;
static int arr[10];
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr);
}
printf("\n");
return 0;
}

Global:

int arr[10];
int main()
{
int i;
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr);
}
printf("\n");
return 0;
}

As for a reference:
http://stackoverflow.com/questions/1831290/static-variable-initialization
and
http://stackoverflow.com/questions/3373108/why-are-static-variables-auto-initialized-to-zero,
both of which then reference the C++ standard.

Frank Miles · Oct 22, 2013

If they are static or global, then _yes_they_are_. They are zeroed.

Click to expand...

Not that I don't believe you, but do you have a reference for this?
Because I keep finding references to uninitialised C arrays filled with
garbage if you don't initialise them.

Wait... hang on a second...

/fires up the ol' trusty gcc

[steve@ando c]$ cat array_init.c
#include<stdio.h>

int main()
{
int i;
int arr[10];
for (i = 0; i < 10; i++) {
printf("arr[%d] = %d\n", i, arr);
}
printf("\n");
return 0;
}

[steve@ando c]$ gcc array_init.c
[steve@ando c]$ ./a.out
arr[0] = -1082002360
arr[1] = 134513317
arr[2] = 2527220
arr[3] = 2519564
arr[4] = -1082002312
arr[5] = 134513753
arr[6] = 1294213
arr[7] = -1082002164
arr[8] = -1082002312
arr[9] = 2527220

What am I missing here?

What you're missing is that arr[] is an automatic variable. Put
a "static" in front of it, or move it outside the function (to become
global) and you'll see the difference.

Steven D'Aprano · Oct 22, 2013

[Dave Angel responds:]

Did you even read the paragraph you quoted above? The BNF
specification does NOT completely describe a language, it only defines
its syntax.

Click to expand...

[Steven D'Aprano responds:]

Like every other language, C programs are certainly not *just* syntax.
Here is some syntax:

&foo bar^ :=

Click to expand...

Now, I don't know where y'all were taught Computer Science,

Melbourne University Computer Science Department. How about you?

but BNF
specifies not only syntax (which would be the *tokens* of a language),
but also its *grammar*; how syntax relates to linguistic categories
like keywords, and tokens relate to each other.

I'm not about to get into a debate about the difference between syntax
and grammar as they apply to computer languages, because it doesn't
matter. Neither syntax nor grammar tell you what something *means*, the
semantics of the code. The parser knows that (say) "x ^% y" is legal and
(say) "x y ^%" is not, but it doesn't know what machine code to generate
when it sees "x ^% y". That's not the job of the parser.

I expect that some compilers -- ancient ones, or lousy ones, or simple
ones -- have a single routine that do all the parsing, lexing, code
generation, linking, optimizing, etc., rather than separate routines that
do the parsing, the code generation, and so on. But even those compilers
cannot just take a description of the syntax and intuit what it means.

Syntax isn't enough. At some point, the compiler needs to know that "^"
means to generate code to dereference pointers (like in Pascal), or
perhaps it's bitwise or (like in C), or maybe even exponentiation (like
in VisualBasic), or perhaps it's something completely different.

Dave is claiming that BNF only defines the syntax of a language, but
then Stephen goes on to supply some syntax that a BNF specification of
the language would not allow (even though Steven calls it "syntax" which
is what BNF in Dave's claim parses).

I think you have misunderstood me. I gave an example of some invented
syntax that your hypothetical language might choose to use. Here it is
again:

&foo bar^ :=

Since I didn't provide the BNF specification for that syntax, you aren't
in a position to say it is illegal. You should assume that it is legal
syntax. If you really insist, I'll waste my time and yours and generate a
BNF specification that allows that as valid syntax, or you could just
accept that it's legal for this imaginary language.

Your task is to describe what the code does, and hence what machine code
your hypothetical compiler will generate, when it sees that line of code.

You should be asking "How the hell can I tell what that does?" Exactly.
That's the point. Neither can the compiler, not from syntax alone.

Neil Cerutti · Oct 22, 2013

Are you talking generically, or specifically about the C code
referenced in the link I gave?

"Memory is always zeroed" is one of the advantages of Go over C and C++,
at least according to Rob Pike:

http://commandcenter.blogspot.com.au/2012/06/less-is-exponentially-
more.html

Go initializes variables to defined zero values, not simply to
all-bits zero as (I think) C does.

This isn't as great a feature as it seems, since the zero value
for some built in types, e.g., map, is unusable without manual
construction. In addition, you can't define a zero value for your
own types.

Piet van Oostrum · Oct 22, 2013

Mark Janssen said:
I love it. Watch this...

[context]
[Dave Angel responds:]

Did you even read the paragraph you quoted above? The BNF specification
does NOT completely describe a language, it only defines its syntax.

Click to expand...

[Steven D'Aprano responds:]

Like every other language, C programs are certainly not *just* syntax.
Here is some syntax:

&foo bar^ :=

Click to expand...

Now, I don't know where y'all were taught Computer Science, but BNF
specifies not only syntax (which would be the *tokens* of a language),
but also its *grammar*; how syntax relates to linguistic categories
like keywords, and tokens relate to each other.

Syntax is grammar. Tokens are part of the grammar (but often specified separately with a different grammar, usually regular expressions, which is a subset of BNF).

So are you just confused or are you trollong?

Web Based Front End?	1	May 8, 2009
Py 3.3.2, MacBookPro, segmentation fault, GCC issue?	3	Nov 6, 2013
Joined for Python project	11	Dec 17, 2024
New to python	4	Aug 7, 2023
Gui front-end to version control program	4	Jan 19, 2008
How does a HEAD pointer end up pointing to the first node in a linked list?	3	Jan 24, 2023
[ANN] git JSONRPC web service and matching pyjamas front-end	3	Jun 29, 2010
Very Urgent requirement for Silverlight front end	0	Nov 9, 2012

Python Front-end to GCC

Philip Herron

Steven D'Aprano

Steven D'Aprano

Chris Angelico

Dave Angel

Oscar Benjamin

Mark Janssen

Steven D'Aprano

Grant Edwards

Grant Edwards

Ned Batchelder

Antoine Pitrou

Benjamin Kaplan

Steven D'Aprano

Mark Lawrence

Chris Kaynor

Frank Miles

Steven D'Aprano

Neil Cerutti

Piet van Oostrum

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads