What a stupid gcc!

G

Guest

and COBOL, of course
I hope you won't think me rude (and please tell me to mind my own
business if you think the question impertinent) but what programming
languages have had experience of? Block structure like this has been
around since... actually I don't know but Algol had it in the late 50s.

I think they invented it. It's often credited by other languages that don't at first glance look very Algolly.
I had to think hard to come up with a structured language without it,
but Pascal is a rare example. It has nested function declarations, but
plain blocks can't nest declarations.

oh, that's a surprise, I thought it allowed it. Long time since I wrote any Pascal.
Your question "what's so special about a compound statement" is, to my
mind, the wrong way round. What's so special about the block that forms
a function body that it alone is allowed to contain declarations?
quite
On just about every other occasion where the complexity of a function
grows beyond about 3 lines, I'm constantly told it needs to split into
sub-functions. Now entire, self-contained sub-programs can be created
in each branch of a minor 'if' in a minor 'for' loop in a dusty corner
of a function, and that is perfectly OK?

My liking for small helper functions is in fact thwarted by the fact
that C's syntax takes a leaf out of your book. While variable
declarations can be nested in enclosed blocks, function definitions
can't be. Perhaps it would be done more often if one could do this:

int some_function(int l, int m, int n, int mat[l][m][n])
{
int none_zero()
{
for (int i = 0; i < l; i++)
for (int j = 0; j < m; j++)
for (int k = 0; k < n; k++)
if (mat[j][k] == 0)
return 0;
return 1;
}

if (none_zero()) /* ... */
/* do stuff */
if (/*still */ none_zero()) /* ... */
/* ... */
}

(yes, I know gcc allows such things when not is standard C mode.)


yes i miss this sometimes
No, I think you've just had a very different exposure to programming
languages and that's affected what you think is usual, helpful,
troublesome and so on.

and learns them in a manner that seems odd to me.

but then my first language was Algol-60 and I avidly read the "Revised Report on the Algorithmic Language Algol 60". I'm always disappointed when other languages fail to match this document.
 
B

BartC

I don't think it was ever as tidy as you thought it was

int x;
int y;

int f (int x)
{
int y;
}

int main (void)
{
int x;

f (x);
f (y)
}

You get this instead:

M module
X var
Y var
F fn
X param
Y var
MAIN fn
X var

Still tidy in that the structure simply reflects the hierarchy in the
program. If you wanted, at the end of the compilation, a list of all
variables in a function, it's easy; for main(), it's just main:(X). Now
change your main() function to:

int main (void)
{
int x;

if (x) {
int x=3;
f(x);
}

f (x);
f (y)
}

and the last part of the table becomes:

MAIN fn
? block
X var
X var

What goes in place of ? It would need to be some internal name such as
block#001. That list of variables now becomes ((block#001: (X)),X); it's not
a simple list anymore, and you have a set of anonymous blocks with
meaningless names.

A stack is a dynamic structure. Suppose you want to retain all details of
all variables at the end of compilation, for debugging purposes for example?
An example where block scope makes sense in C++ :-

void func ()
{
{
Lock lock (table_semaphore);
do_stuff_on_locked_table();
do_more_stuff_on_locked_table();
}

long_operation_without_lock()
}

Lock's destructor cleans up the semaphore

Which exposes another problem with block-variables: there might be extra
overheads in allocating, initialising and freeing them on each entry and
exit to the block. For each time a function is called, a block might be
executed millions of times. (I'm sure most uses will optimised out, so
variables are allocated at function-entry, but it's something else to keep
in mind.)
 
B

BartC

[why block scope?]
I accept that it does allow you to do stuff like this, but it really
doesn't
seem a very good idea, using the same identifier for two different types,
within a few lines of each other in the same function. ....
Sorry, but these possibilities which everyone else thinks are great, to
me
just seem nightmarish.

don't try programming in a dynamically typed language then!

func fred ()
{
var = 1.02; // var is a float
some_code ()
var = "hello"; // now it's a string
}

[this isn't any particular language- Python or scheme can do this
sort of stuff but with different syntax]

I've been working with dynamic languages of this kind for years (designing
and implementing them too).

You can argue that in this example, there is only one type involved
('variant'), but two different values for it (1.02 and "hello"). It's not
the same as the C example, which has stricter kinds of typing.

Also someone looking at this code will know it's a dynamic language and will
not have any expectations about the precise type of the value of 'var'.

(Having said that, most variables in a dynamic language will tend to have
just the one type of value.)
There are an awful lot of languages that allow block scope variables!

But from various source codes I've seen, not used very often. Or only used
for the ability to define variables half-way down a function; the
block-scope properties are irrelevant.
 
B

BartC

Ben Bacarisse said:
I hope you won't think me rude (and please tell me to mind my own
business if you think the question impertinent) but what programming
languages have had experience of?

For the past 30 years, mainly my own ones. C is the only mainstream language
I know well enough (although you might dispute that..) to use seriously if
I needed to. For runtime libraries, I've partly relied on C's for the last
15 years on so.

And before that, the usual Algol, Cobol, Pascal etc that were common on CS
degree courses in the late 70s. Plus a year of Fortran.
Block structure like this has been
around since... actually I don't know but Algol had it in the late 50s.

I'm aware that Algol-60 had loads of complicated stuff (eg. call-by-name),
which I never used and partly didn't understand. It didn't hinder any
programming I had to do.
I had to think hard to come up with a structured language without it,
but Pascal is a rare example. It has nested function declarations, but
plain blocks can't nest declarations.

Good decision to avoid those. (But then it introduced 'case' into
records...)
Your question "what's so special about a compound statement" is, to my
mind, the wrong way round. What's so special about the block that forms
a function body that it alone is allowed to contain declarations?

'The block that forms a function body'. Why call it a block, why not just a
function body? It happens to use {,} delimiters that are the same as those
used on block statements, but that's all. What about the 'block' that forms
a module body? You don't think of it as a block, because {,} aren't used to
delimit it; it uses beginning/end of file instead.
My liking for small helper functions is in fact thwarted by the fact
that C's syntax takes a leaf out of your book. While variable
declarations can be nested in enclosed blocks, function definitions
can't be. Perhaps it would be done more often if one could do this:

int some_function(int l, int m, int n, int mat[l][m][n])
{
int none_zero()
{
for (int i = 0; i < l; i++)
for (int j = 0; j < m; j++)
for (int k = 0; k < n; k++)
if (mat[j][k] == 0)
return 0;
return 1;
}

if (none_zero()) /* ... */
/* do stuff */
if (/*still */ none_zero()) /* ... */
/* ... */
}

(yes, I know gcc allows such things when not is standard C mode.)

OK. But would such nested functions only be allowed at the top of the
function, or anywhere in the body? (Where they would be presumably be
limited to the enclosing block scope.)

(I'm working on two languages and two compilers at the moment. I found I
could get nested functions for free (by commenting out a line that reported
them as an error!)

It worked too, provided the nested function didn't try to access the
variables of the immediately surrounding scope (I believe this was
impossible anyway because name resolution didn't allow it).

That was interesting to know, but I then put the error-check back in. Nested
functions I can just about see a use for, but allowing a whole symbol table
hierarchywherever I happen to have a few statements in a row, I can't see
much use for at all.

BTW these languages don't have blocks statements, only Algol-68-style
'serial clauses'. (And I know the latter does have multiple nested scopes,
but it has a lot of other stuff that makes implementation difficult too.)
 
G

Guest

You get this instead:

M module
X var
Y var
F fn
X param
Y var
MAIN fn
X var

Still tidy in that the structure simply reflects the hierarchy in the
program.

Do real programming languages actually implement symbol tables like this?
If you wanted, at the end of the compilation, a list of all
variables in a function, it's easy; for main(), it's just main:(X). Now
change your main() function to:

int main (void)
{
int x;

if (x) {
int x=3;
f(x);
}

f (x);
f (y)
}

and the last part of the table becomes:

MAIN fn
? block
X var
X var

What goes in place of ? It would need to be some internal name such as
block#001. That list of variables now becomes ((block#001: (X)),X); it's not
a simple list anymore, and you have a set of anonymous blocks with
meaningless names.

I think your scheme only works with FORTRAN. All bets are off as soon as recursion is introduced.
A stack is a dynamic structure. Suppose you want to retain all details of
all variables at the end of compilation, for debugging purposes for example?

you don't write recursive programs do you?
Which exposes another problem with block-variables: there might be extra
overheads in allocating, initialising and freeing them on each entry and
exit to the block.

usually not. The block structure is actually static so the space can be allocated on entry to the function.
For each time a function is called, a block might be
executed millions of times. (I'm sure most uses will optimised out, so
variables are allocated at function-entry, but it's something else to keep
in mind.)

in C++ the CTORs and DTORs have to be executed and that may be expensive. C++ programmers have to take more care.
 
B

Ben Bacarisse

Per expression can be useful. Algol-68, famous for it's orthogonality,
made no distinction between a block and a sub-expression. You could
write:

2 * (INT i; read(i); i)

as an expression if you so wished.
for (int i = 0; i < 10; i++) sum += i;

I believe C++ now allows declarations in whils and ifs (?)

Yes, and switch too. Also in the *second* position of a 'for' statement
so that it remains a sort of enhanced 'while'.
 
B

BartC

Do real programming languages actually implement symbol tables like this?

Real program in a module called 'm' (return value of function edited out):

function fn =
var a,b,c
end

Real extract from the symbol table (extraneous matter edited out):

$root-----------Root
m---------------Module
fn--------------Proc
a---------------Frame
b---------------Frame
c---------------Frame

It might not be mainstream, but it's a real enough language. But why
shouldn't a symbol table look like this? (Obviously, the internal structure
is more elaborate, but following the hierarchical links starting from $root,
you get the above layout.)
I think your scheme only works with FORTRAN. All bets are off as soon as
recursion is introduced.

Perhaps we're talking at cross-purposes then. What has recursion in the
language being compiled, got to do with any of this?

Remember these symbol tables represent the static, lexical structure of all
entities defined in the source code. They don't describe the run-time
call-tree.
you don't write recursive programs do you?

Only compilers.
usually not. The block structure is actually static so the space can be
allocated on entry to the function.

Thanks. Somebody has finally admitted that function entry and exit is where
these things naturally belong...
 
B

Ben Bacarisse

BartC said:
For the past 30 years, mainly my own ones. C is the only mainstream language
I know well enough (although you might dispute that..) to use seriously if
I needed to. For runtime libraries, I've partly relied on C's for the last
15 years on so.

Maybe the 30 years working on your own languages has led to a kind of
isolationism, but that surprises me. If I'd spent decades working on my
own languages, I'd expect to have gained a huge experience of other
languages. I'd want to know what everyone else was doing, both in
theory and in practice.

Maybe you have studied what other languages do and you've decided,
rationally, that they are wrong. If so, that debate might be
interesting (but not here -- it's wildly off-topic).
And before that, the usual Algol, Cobol, Pascal etc that were common on CS
degree courses in the late 70s. Plus a year of Fortran.


I'm aware that Algol-60 had loads of complicated stuff (eg. call-by-name),
which I never used and partly didn't understand. It didn't hinder any
programming I had to do.

Block structure isn't complicated -- adding special cases is
complicated. C is full of special cases. They are all there for a
reason but explaining them is almost always about implementation not
programming. In my perfect CS syllabus, C would be taught as an
example language in a course about compilers. It's no bad thing to
learn C, but if you learn it as you learn how to compiler it you
understand almost all of the special cases with no extra effort.

'The block that forms a function body'. Why call it a block, why not just a
function body? It happens to use {,} delimiters that are the same as those
used on block statements, but that's all. What about the 'block' that forms
a module body? You don't think of it as a block, because {,} aren't used to
delimit it; it uses beginning/end of file instead.

No, I do think of it as a block -- a badly broken block with all kinds
of special rules. Restrictions on array types and the form of
initialisers, no statements allowed, special permission to define
functions, etc, etc.

I call the body of a function a block (speaking generally here, not
specifically about C) because I want to have as few 'entities' in the
language as possible. It's a sort of Occam's razor in language design.
In fact I'd have gone further, had I been designing C. I'd have made
the "body" of a function into an expression and have one kind of
expression be a block that can have a value. One of C's predecessors,
BCPL, had this in the form of a "valof" expression:

LET sub(x, y) = add(x, neg(y))
LET add(x, y) = VALOF
$( LET a = x+y
IF 0<=a<modulus RESULTIS a
ELSE RESULTIS a-modulus
$)

The idea is to permit everything anywhere, and to give everything the
most general form possible. This is tempered by other considerations
such the ease of producing an efficient implementation.
My liking for small helper functions is in fact thwarted by the fact
that C's syntax takes a leaf out of your book. While variable
declarations can be nested in enclosed blocks, function definitions
can't be. Perhaps it would be done more often if one could do this:

int some_function(int l, int m, int n, int mat[l][m][n])
{
int none_zero()
{
for (int i = 0; i < l; i++)
for (int j = 0; j < m; j++)
for (int k = 0; k < n; k++)
if (mat[j][k] == 0)
return 0;
return 1;
}

if (none_zero()) /* ... */
/* do stuff */
if (/*still */ none_zero()) /* ... */
/* ... */
}

(yes, I know gcc allows such things when not is standard C mode.)

OK. But would such nested functions only be allowed at the top of the
function, or anywhere in the body? (Where they would be presumably be
limited to the enclosing block scope.)


The benefit (usually to the implementor) of the restriction has to be
weighed against the benefit of having a simpler -- more general --
language specification for the programmer, so I'd always want to ask
"why am I making this restriction?" rather than "would it be useful to
allow X here?".
(I'm working on two languages and two compilers at the moment. I found I
could get nested functions for free (by commenting out a line that
reported them as an error!)

It worked too, provided the nested function didn't try to access the
variables of the immediately surrounding scope (I believe this was
impossible anyway because name resolution didn't allow it).

That was interesting to know, but I then put the error-check back
in. Nested functions I can just about see a use for, but allowing a
whole symbol table hierarchywherever I happen to have a few statements
in a row, I can't see much use for at all.

BTW these languages don't have blocks statements, only Algol-68-style
'serial clauses'. (And I know the latter does have multiple nested scopes,
but it has a lot of other stuff that makes implementation difficult too.)

Comments about your own languages are not very helpful without an
explanation of the design goals. What are they to be used for? Who are
they aimed at? What hardware and software environment must they work
within? Without any of that, all I have to go on is plain fact: that
you decided to one thing rather than another.
 
A

Andrew Smallshaw

and the last part of the table becomes:

MAIN fn
? block
X var
X var

What goes in place of ? It would need to be some internal name such as
block#001. That list of variables now becomes ((block#001: (X)),X); it's not
a simple list anymore, and you have a set of anonymous blocks with
meaningless names.

Have you never written a recursive function? You have exactly the
same issues, compounded by identical line numbers and the inevitable
masking of variables. Aside from what is essentially criticism of
structured programming generally (seriously guys, that argument
was lost thrity years ago) a lot of the specific criticism of this
aspect has implicitly assumed that masking one variable with another
is intentional - it's not, in fact intentionally doing so would
generally be bad practice. Like any language feature it is of no
use against a c.l.c.er determined to write contrived poor code. It
is simply an additional protection against ill-considered external
modification, and better bringing the logical and physical extents
of a variable into alignment.

For example, consider the statement "int error_code;". We can
naturally infer upon reading it in a source listing that is contains
some kind of code describing an error condition. It's behaviour
when there is an error should be easily determined, but what does
it contain in the _absence_ of an error? Does it contain anything?
Has it been initialised to some "No error" value that is actually
depended on for normal operation? Has it been "borrowed" by a
programmer who didn't understand block level declarations?

These are the kind of thing that need a lot of careful examination
to determine and add a lot of mental overhead to the programmer.
They are also issues that simply don't arise if the variable doesn't
exist until such time as there is an error.
 
A

Andrew Smallshaw

While I'm not a big fan of "big" functions, I think everyone has a different
idea of just _when_ a function has gotten "too big" and should be split up.

That said, I do have to maintain an app that includes a 1600+ line main()
that someone else wrote [mumble mumble] years ago. (Talk about "feature
creep" run amok.) It's been on my "to do" list for years to redo it, but
it's one of those "if it ain't broke..." things. (Yes, many people would
call a 1600+ line function "broke", but it runs correctly.)

Personally I'm not a big fan of concrete rules in that respect.
The "four pages of one function is too much" sentiment expressed
earlier is a little naive in my opinion. I've written thousand
line _switch_statements_ before now: we're only talking about 70
cases of 15 lines each hit that mark. They can be perfectly
managable since you're frequently talking about relatively 'flat'
code without a lot of dependencies between one section and the
next.

It's the amount of _context_ that needs to be kept in mind when
reviewing code that determines how difficult to understand it is,
and that is a different metric to how _long_ it is. That's actually
one of the main advantages of block scoping - it is clear that you
don't need to worry about something that is out of scope.

OTOH I can think of other functions - one particularly 'clever'
data structure in particular - around here where even 70 line
functions are too long to be easily digested. That one case in
particular I've looked long and hard at how it could be subdivided
further and eventually came to the conclusion that it can't in any
sensible manner - the coupling between the various portions is
simply too tight. _That's_ a problematic case: functions an order
of magnitude longer need not be.
 
G

Guest

For the past 30 years, mainly my own ones. C is the only mainstream language
I know well enough (although you might dispute that..) to use seriously if
I needed to. For runtime libraries, I've partly relied on C's for the last
15 years on so.

And before that, the usual Algol, Cobol, Pascal etc that were common on CS
degree courses in the late 70s. Plus a year of Fortran.

I'm aware that Algol-60 had loads of complicated stuff (eg.
call-by-name), which I never used and partly didn't understand.
It didn't hinder any programming I had to do.

I don't think block scoping is complicated- particularly as dozens of languages copied the idea. I also don't recall any other "weird stuff". Call by name was a bit of a fu. If C did it it might look like this

int sum (int a, int n)
{
int total = 0;
int i;

for (i = 0; i < n; i++)
total += a;

return total;
}

sum (10, i) => 100
so far so innocuous

int b [] = {1, 2, 3};
sum (b, 3) => 6
the function acts more like a macro

BTW these languages don't have blocks statements, only Algol-68-style
'serial clauses'. (And I know the latter does have multiple nested
scopes, but it has a lot of other stuff that makes implementation
difficult too.)

Algol-68 was very "innovative". Too innovative. It started with an incomprehensible definition and went down hill from there.
 
B

Ben Bacarisse

I don't think block scoping is complicated- particularly as dozens of
languages copied the idea. I also don't recall any other "weird
stuff". Call by name was a bit of a fu. If C did it it might look like
this

int sum (int a, int n)
{
int total = 0;
int i;

for (i = 0; i < n; i++)
total += a;

return total;
}

sum (10, i) => 100
so far so innocuous

i is undeclared. You probably meant sum(10, 10)?
int b [] = {1, 2, 3};
sum (b, 3) => 6
the function acts more like a macro


No. sum(b, 3) would depend on the value of i in the scope of the
call. Call by name is more like a macro with hygienic names -- the 'i'
in argument expression b can't be bound by the 'i' local to sum.
Whilst the mechanism was, err... powerful, it was not mad!

To get what you want, you'd have to pass the "outer" i by name as well:

int sum (int x, int a, int n) // assume call-by-name
{
int total = 0;
for (x = 0; x < n; x++)
total += a;
return total;
}

int i; // value irrelevant
sum(i, b, 3)

<snip>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,079
Messages
2,570,574
Members
47,205
Latest member
ElwoodDurh

Latest Threads

Top