disgusting compiler !! hahaha!!

K

Keith Thompson

David Brown said:
For non-zero based arrays, the compiler hides the offset from you. In
the 'a' .. 'z', the compiler would put the " -'a' " in for you.

If intermediate pointers to unallocated memory were not a problem,
we could have any offset we like in C:

int a_[] ={ 1, 2, 3 }; int * a = a_ - 1; /* a now is 1-based */

. This might cause UB, but should work in many environments.

This *definitely* causes UB. I think "Numeric Recipes in C" used
that technique, and yes, it commonly works, but I can easily imagine
it failing on some architectures if a_ happens to be allocated at
the beginning of a segment.
 
B

BartC

Kaz Kylheku said:
Higher-level languages have less reason start to counting from zero which
is
less intuitive. Most counting in real-life is 1-based.

1. Indexing isn't necessarily counting!

[ ] [ ] [ ]
^ ^ ^ ^
0 1 2 3

The array index is the left corner of the box. The count is the right
corner.

The index indicates: what is the displacement? How many elements are
before this one? ......
Note that clocks measure the day from 00:00 to 23:59, not from 01:01
to 24:60. People generally do not have a problem with this.

Also, countdown timers go to zero. If you cook something with your
microwave for 27 seconds, it starts at 27, and counts down to zero,
once per second.

That's measuring rather than counting. Measuring (as in your top example
where you measure from left of the first box to the right of the last box),
necessarily starts from zero (and in real-life tends to be a continuous
value rather than a whole number).

But there are three boxes above and it is natural to number them 1, 2 and 3.
The physical positions might be 0.0, 1.0, 2.0 and 3.0 at edges of the boxes,
and even 1.8 for 80% across the middle box.

There are few exceptions, for example I would deal with discrete bits in an
32-bit integer, but number them from 0 to 31 because that's the convention
(but I'm not sure C assigns an index to them anyway).
When year 2000 rolled around, numerous people around the world
thought that it's the start of the new millennium and celebrated.

So did I; changing from 1999 to 2000 was a bigger deal than whether this was
actually the third millennium or not.

It would also be ludicrous if the 'nineties' for example didn't start until
1-Jan-1991 and didn't end until 31-Dec-2000. But those of more examples of
measurement (of time in this case), rather than counting or indexing.
Zero all the way, without a question.

For instance, suppose I want to regard that list as pairs. I want to
select
either (a, b) or (c, d) based on n. It's easy: just take elements 2*n,
and 2*n+1. If n is 1 based, I have to do algebra: 2*(n-1) and 2*(n-1)+1,
which goes to 2n-2 and 2n-2+1 = 2n-1.

Yes, for some kinds of calculations, base 0 is simpler. This doesn't means
you have to use base 0 everywhere. In that language construct, Algol-68 uses
base 1.
Indexing multi-dimensionally gets even more retarded.

Only if you have to do the index calculations yourself (accessing a 1D array
as 3D for example). Otherwise base 1 is probably a little simpler than base
0 (index from 1 to L, M, N for each index, instead of 0 to L-1, M-1, N-1).
 
K

Kenny McCormack

Keith Thompson said:
I find arguments of the form "This is stupid!" "No, that's stupid!"
intensely boring.

I find you both stupid and boring.

Do I get full credit for that?

--
The problem in US politics today is that it is no longer a Right/Left
thing, or a Conservative/Liberal thing, or even a Republican/Democrat
thing, but rather an Insane/not-Insane thing.

(And no, there's no way you can spin this into any confusion about
who's who...)
 
G

glen herrmannsfeldt

(snip, I wrote)
I have written a few compilers for several languages. C
compilers are actually fairly complex compared to many
other languages a lot because the language has evolved
over time and so much of it is compiler defined but
constrained by conventional wisdom.

Yes, I was probably thinking about older C, but even so, compare
C11 to Fortran 2008, or even PL/I in 1966. (Note for one that
both PL/I and Fortran don't have reserved words. Just another
complication for the compiler to figure out.)

-- glen
 
G

glen herrmannsfeldt

Walter Banks said:
James Kuyper wrote:
(snip)
There is a lot of evidence that the added 4 bits a 36 bit float
(as opposed to a 32 bit single precession) would change a lot
of the prcession problems for most applications.

For a large number of practical problems, you need double precision
arithmetic to get single precision results.

The precision you need for intermediate values in matrix
computations tends to increase with the size of the matrix,
and matrix computations have gotten a lot larger since the
days of 36 bit machines.

It would be interesting to have a 48 bit floating point type,
so six bytes on byte addressed machines.
Something that has been missed in many processor designs is the
effect of data widths on the ability of a processor to be used
in applications. 2^^N widths have a minimum advantage in hardware
implementation. I have worked on several processors designs
where a non standard data width contributed substantially to
application throughput.

-- glen
 
G

glen herrmannsfeldt

Keith Thompson said:
(e-mail address removed)-berlin.de (Stefan Ram) writes: (snip)
If intermediate pointers to unallocated memory were not a problem,
we could have any offset we like in C:
int a_[] ={ 1, 2, 3 }; int * a = a_ - 1; /* a now is 1-based */
. This might cause UB, but should work in many environments.
This *definitely* causes UB. I think "Numeric Recipes in C" used
that technique, and yes, it commonly works, but I can easily imagine
it failing on some architectures if a_ happens to be allocated at
the beginning of a segment.

Well, it mostly fails at the beginning of a segment if you do bounds
checking wrong, or if the arithmetic wraps wrong.

Not that I like the solution very much.

-- glen
 
K

Keith Thompson

Malcolm McLean said:
With zero-based arrays

image[y*width+x] = value;

in C you have to do this all the time.

You don't *have* to do it if your compiler supports VLAs.
 
W

Walter Banks

glen said:
For a large number of practical problems, you need double precision
arithmetic to get single precision results.

The precision you need for intermediate values in matrix
computations tends to increase with the size of the matrix,
and matrix computations have gotten a lot larger since the
days of 36 bit machines.

It would be interesting to have a 48 bit floating point type,
so six bytes on byte addressed machines.


-- glen

Points well taken. I agree that 32 bit floats for many applications
are under what is needed. Byte machines and some floating point
packages make it not that hard to add a byte to the mantissa.

I implemented 48 bit floats on a 24 bit processor a few years ago.
(40bit Mantissa)

w..
 
D

David Brown

David Brown said:
On 08/05/14 11:15, BartC wrote:

If the choice is *only* between 0 and 1, then 0 is more versatile. But
it's
not hard to allow both or any. Both 0 and 1 bases have their uses;
1-based I
think is more useful as the default base.

When you just going to have one type of array, then indexing by
integers
starting from 0 is the simplest and clearest - you have the offset from
the start of the array.

But it doesn't always make sense to keep thinking of offsets. Look at
the
'A'..'Z' example below, and tell me where offsets from the start of the
array come in, on the line with rotationcypher['F'].

The point is that when you use 0-based arrays, you can think of offsets.

For non-zero based arrays, the compiler hides the offset from you. In
the 'a' .. 'z', the compiler would put the " -'a' " in for you.

Exactly. So why shouldn't there be the option for a lower bound that isn't
zero? As you say, it's not that difficult for a compiler to deal with it.

The problem is not in the compiler - it could add the "-'a'" when
accessing the array, or it could (on most targets, in most situations)
do the subtraction on the symbol at link time.

The problem is user expectations, along with the expectations of other
tools that read or write C code. It is a fundamental part of the
definition of C that if xs is an array, then xs and xs[0] have the same
address, and a pointer to xs[0] is the same as xs. Allowing non-zero
lower bounds breaks that rule.

In languages like Pascal or Ada, there never was such a rule - so there
never has been such an assumption. In C++, the rule is true for POD
arrays (inherited from C), but not for classes with a [] operator, and
it is fine to have different lower bounds or different types for the
indices. But making lower bounds non-zero would break C, so it is not
going to happen (at least, not without a lot of effort, and perhaps some
new syntax to make things clear).

As I said, I have no argument against it being a nice feature that would
make some code clearer and give better compile-time checking. I just
don't see it happening in C.
I can write this [** not in C **]:

['A'..'Z']char rotationcypher

That format is pointlessly different from C style - if these sorts of
arrays are ever to make sense in C, C++, or a C-like language, then they
would be written in roughly the form I gave.

I said that was not C. I wrote it that way because it is *an actual working
example* of a language that does C-like things yet allows any array lower
bound. And that C output was actual output (with some types adjusted to
make
it clearer).

In C you'd put the dimensions where it normally expects them (I don't know
what syntax would be proposed. I use either [lower..upper] or
[lower:length] or [length] or [lower:].)

I think the "[lower:]" syntax is /definitely/ optimistic for C.
Infinite lists are common in functional programming languages, but we
won't see lazy evaluation in C in the near future!
I can't make sense of you here. I said specifically that arrays with
ranges or enumerated types as indexes "would lead to clearer code and
better compile-time checking for ranges and types". I am not "trying to
deny this to C programmers" - I think it would be a useful enhancement
to the language, which is why I wrote about it.

I think that simply having a choice of lower bound would also be a useful
enhancement, and one considerably simpler to implement than turning C into
Pascal or Ada at the same time, with all these type checks (which are much
more difficult than you might think when you do it properly).

The compiler can do range checking it if likes. But at present, an array
like char a[100] already has a range of 0..99, and few compilers seem to
bother checking it! (A quick test shows only 2 out of 6 checking it at
normal warning levels. gcc doesn't seem to be it any any level, but
doubtless there will be an option hidden away to do so.)

gcc checks array bounds at compile time, but only if you have at least
-O2 or -Os optimisation (and -Wall), as the optimisation passes are used
to collect the information needed for checking. There are a number of
gcc warnings that only work (or only work well) with optimisation enabled.

I would say that compile-time checking of types and bounds is essential
to such a feature - the whole point would be to help write better code,
and that means making it easier for the programmer to write clearer code
and also making it easier for the tools to spot mistakes.
A decision to switch languages is not really an answer! (And it sounds like
C++ only manages it with some trouble.)

Anyway not everyone likes to drag in the complexity of a C++ compiler just
for one or two extra features which ought to be in C already.

A lot of people use C++ as ABC - "A Better C". It is not necessarily a
bad idea. And while C++ introduces a lot of extra complexity, it has
got a bit easier with C++11 "auto", which can save a lot of messy
template typing.
 
B

BartC

David Brown said:
On 08/05/14 15:41, BartC wrote:
In C you'd put the dimensions where it normally expects them (I don't
know
what syntax would be proposed. I use either [lower..upper] or
[lower:length] or [length] or [lower:].)

I think the "[lower:]" syntax is /definitely/ optimistic for C.
Infinite lists are common in functional programming languages, but we
won't see lazy evaluation in C in the near future!

C allows "[]", it just means the bounds are not specified:

int a[] = {1,2,3};
int (*b)[];
extern int c[];
 
J

James Kuyper

Keith Thompson said:
(e-mail address removed)-berlin.de (Stefan Ram) writes: (snip)
If intermediate pointers to unallocated memory were not a problem,
we could have any offset we like in C:
int a_[] ={ 1, 2, 3 }; int * a = a_ - 1; /* a now is 1-based */
. This might cause UB, but should work in many environments.
This *definitely* causes UB. I think "Numeric Recipes in C" used
that technique, and yes, it commonly works, but I can easily imagine
it failing on some architectures if a_ happens to be allocated at
the beginning of a segment.

Well, it mostly fails at the beginning of a segment if you do bounds
checking wrong, or if the arithmetic wraps wrong.

a_-1 does violate a bound, so I don't see how implementing pointer
arithmetic in such a way that it fails catastrophically would qualify as
doing "bounds checking wrong".
 
W

Walter Banks

Keith said:
A function *declaration* is something like:

void func(void);

The corresponding function *definition* (which also provides a
declaration) is:

void func(void) {
/* ... */
}

What's disallowed in standard C is, for example:

void outer(void) {
void disallowed(void) {
/* ... */
}
/* ... */
}

Both function declarations and function definitions are part of the
source text. Nobody was referring to functions as abstract entities,
or to function designators.

C has variable scoping right but didn't see the significant
advantages that similar scoping rules would have for functions.
A lot of code reliability could have been improved with local
functions.

The implementation of nested functions has very little impact
compiler implementation. Quite a few C compilers have scoped
functions capability implemented as a C extension.

w..
..
 
W

Walter Banks

David said:
David Brown said:
On 08/05/14 11:15, BartC wrote:
If the choice is *only* between 0 and 1, then 0 is more versatile. But
it's
not hard to allow both or any. Both 0 and 1 bases have their uses;
1-based I
think is more useful as the default base.
When you just going to have one type of array, then indexing by integers
starting from 0 is the simplest and clearest - you have the offset from
the start of the array.

But it doesn't always make sense to keep thinking of offsets. Look at the
'A'..'Z' example below, and tell me where offsets from the start of the
array come in, on the line with rotationcypher['F'].

The point is that when you use 0-based arrays, you can think of offsets.

For non-zero based arrays, the compiler hides the offset from you. In
the 'a' .. 'z', the compiler would put the " -'a' " in for you.
If you are going to look for more options, then you should really allow
ranges and different integral types (such as different sized integers,
integer ranges, contiguous enumerated types, etc.). That would lead to
clearer code and better compile-time checking for ranges and types -
similar to Pascal:

int xs[1 .. 100];
char rotationCypher['a' .. 'z'];

I don't see it happening in the C world - especially as it is already
possible in C++ (but with an uglier template declaration syntax, of
course).

Why can't we just have those 1..100 and 'a'..'z' bounds without bringing
all
that other stuff into it?

I can write this [** not in C **]:

['A'..'Z']char rotationcypher

That format is pointlessly different from C style - if these sorts of
arrays are ever to make sense in C, C++, or a C-like language, then they
would be written in roughly the form I gave.
print rotationcypher['F']

Which I can machine-translate to C and it takes care of the offsets needed
to make it work with C's 0-based arrays. Something like this:

unsigned char rotationcypher[26];
printf("%c",(rotationcypher[70-65]));

No special type attributes for the index or bounds, no templates, nothing
special except providing the right offsets.

Clearly such bounds are useful and can make for more readable code; why do
want to deny this to C programmers?

I can't make sense of you here. I said specifically that arrays with
ranges or enumerated types as indexes "would lead to clearer code and
better compile-time checking for ranges and types". I am not "trying to
deny this to C programmers" - I think it would be a useful enhancement
to the language, which is why I wrote about it. But I also think that
it is unlikely to become a part of C, especially as you can implement it
in C++.

C as a language could use range syntax (as if it really need a larger grammar)
'a' .. 'z' type syntax has a lot of uses and commonly found in code so that a
tight compiler implementation would be useful.

For example

if ( a in [-5..46]) // stolen from pascal
{
. . .
}

Similarly with switch case

case ['0'..'9'] :
or
case [100..200] :

Many compilers already have the code generation for these constructs and
generate it by pattern matching to the most common source constructs.

w..


w..
 
D

David Brown

C has variable scoping right but didn't see the significant
advantages that similar scoping rules would have for functions.
A lot of code reliability could have been improved with local
functions.

I often use languages that support local functions - Pascal and Python.
It is rare that I find them useful, except for lambda functions in
Python. When programming in C, I usually use gcc which has support for
local functions, but I have never felt it would significantly improve my
programs. I think that in most cases where local functions really would
make a difference to the structure and quality of the program, you are
probably better off using C++ with access to class member functions
(including local classes) and lambdas.
The implementation of nested functions has very little impact
compiler implementation. Quite a few C compilers have scoped
functions capability implemented as a C extension.

Local functions can often be implemented easily - they can be treated as
a normal "static" function by the compiler. But sometimes the
interaction between variables in the outer function and their usage
inside the local function can make a real mess - the generated static
function needs pointers to outer function variables, variables that used
to be in registers now need to go on the stack, and your optimisation is
lost. And when people start doing "clever" things like taking the
address of the local function, it gets worse - on gcc, this is
implemented using "trampolines" which are run-time generated code put on
the stack.

All in all, general local functions are a pain - and if you restrict
them too much (such as disallowing their address to be taken), you lose
many of the possible uses (such as for a sorting function for qsort()).

Although gcc has nested functions (and has had for many years - they
needed the functionality for languages such as Pascal and Ada), they are
seldom used in C. C++ lambdas and class members are usually a better
choice if you need such structures.

(Note - I don't know the details of /why/ trampolines are needed for
nested functions in C, while they are not needed for C++ lambdas.)
 
M

Martin Shobe

I often use languages that support local functions - Pascal and Python.
It is rare that I find them useful, except for lambda functions in
Python. When programming in C, I usually use gcc which has support for
local functions, but I have never felt it would significantly improve my
programs. I think that in most cases where local functions really would
make a difference to the structure and quality of the program, you are
probably better off using C++ with access to class member functions
(including local classes) and lambdas.


Local functions can often be implemented easily - they can be treated as
a normal "static" function by the compiler. But sometimes the
interaction between variables in the outer function and their usage
inside the local function can make a real mess - the generated static
function needs pointers to outer function variables, variables that used
to be in registers now need to go on the stack, and your optimisation is
lost. And when people start doing "clever" things like taking the
address of the local function, it gets worse - on gcc, this is
implemented using "trampolines" which are run-time generated code put on
the stack.

All in all, general local functions are a pain - and if you restrict
them too much (such as disallowing their address to be taken), you lose
many of the possible uses (such as for a sorting function for qsort()).

Although gcc has nested functions (and has had for many years - they
needed the functionality for languages such as Pascal and Ada), they are
seldom used in C. C++ lambdas and class members are usually a better
choice if you need such structures.

(Note - I don't know the details of /why/ trampolines are needed for
nested functions in C, while they are not needed for C++ lambdas.)

GCC uses trampolines for C because taking the address of the nested
function must result in a function pointer. C++ doesn't need that for
lambdas since they are objects instead of functions.

Martin Shobe
 
B

BartC

Local functions can often be implemented easily - they can be treated as
a normal "static" function by the compiler. But sometimes the
interaction between variables in the outer function and their usage
inside the local function can make a real mess - the generated static
function needs pointers to outer function variables, variables that used
to be in registers now need to go on the stack, and your optimisation is
lost. And when people start doing "clever" things like taking the
address of the local function, it gets worse - on gcc, this is
implemented using "trampolines" which are run-time generated code put on
the stack.

There is an even simpler implementation where the local function can't
access the local variables of its enclosing function. (I found out I could
do this simply by commenting out a check on functions being defined inside
another.)

The local function is compiled as though it was outside. The sole advantage
is that the name of the local function only has a scope within its enclosing
function. (And the same name can be reused inside another function, if the
compiler provides a unique name for each nested function.)

Also if you move/copy the main function elsewhere, it will take all its
locals with it.
 
G

glen herrmannsfeldt

David Brown said:
On 09/05/14 17:49, Walter Banks wrote: (snip)
I often use languages that support local functions - Pascal and Python.
It is rare that I find them useful, except for lambda functions in
Python.

The languages with local (internal) functions usually don't have
something like C's file scope functions.

Sometimes it is convenient to have a small function to help with
something, but not really worth a normal external function.
Also, it is often more readable to have it nearby.

(snip)
Local functions can often be implemented easily - they can be treated as
a normal "static" function by the compiler. But sometimes the
interaction between variables in the outer function and their usage
inside the local function can make a real mess - the generated static
function needs pointers to outer function variables, variables that used
to be in registers now need to go on the stack, and your optimisation is
lost. And when people start doing "clever" things like taking the
address of the local function, it gets worse - on gcc, this is
implemented using "trampolines" which are run-time generated code put on
the stack.

Yes, it gets complicated in the case of recursion, where you need
to get the right instance of the caller.
All in all, general local functions are a pain - and if you restrict
them too much (such as disallowing their address to be taken), you lose
many of the possible uses (such as for a sorting function for qsort()).

If qsort() had a (void*) argument that it passed through to the called
function, it would make some comparisons easier. That would avoid
many of the cases where you would want to use the internal function
to access outside data.

-- glen
 
B

BartC

Walter Banks said:
David Brown wrote:
I can't make sense of you here. I said specifically that arrays with
ranges or enumerated types as indexes "would lead to clearer code and
better compile-time checking for ranges and types". I am not "trying to
deny this to C programmers" - I think it would be a useful enhancement
to the language, which is why I wrote about it. But I also think that
it is unlikely to become a part of C, especially as you can implement it
in C++.

C as a language could use range syntax (as if it really need a larger
grammar)
'a' .. 'z' type syntax has a lot of uses and commonly found in code so
that a
tight compiler implementation would be useful.

For example

if ( a in [-5..46]) // stolen from pascal
{
. . .
}

Similarly with switch case

case ['0'..'9'] :
or
case [100..200] :

Even simpler is to use:

if (a in -5..46) ...

ie. matching against a range (instead of being a member of a set as I
assumed the [...] construction was). This would directly map to:

if (a>=-5 && a<=46) ...

(and hopefully implemented so that a is evaluated once).
Many compilers already have the code generation for these constructs and
generate it by pattern matching to the most common source constructs.

The argument against ranges for switch cases, when it frequently comes up in
c.l.c., is that someone might be tempted to write:

case 'A'..'Z':

instead of:

case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G': case
'H':
case 'I': case 'J': case 'K': case 'L': case 'M': case 'N': case 'O': case
'P':
case 'Q': case 'R': case 'S': case 'T': case 'U': case 'V': case 'W': case
'X':
case 'Y': case 'Z':

(and the former does have a certain conciseness about it). This won't be
quite right if it happens that EBCDIC is being used. But if that is not the
case, or letters are not involved, then it is silly to have to type out
dozens of consecutive case labels.
 
K

Keith Thompson

Walter Banks said:
C has variable scoping right but didn't see the significant
advantages that similar scoping rules would have for functions.
A lot of code reliability could have been improved with local
functions.

The implementation of nested functions has very little impact
compiler implementation. Quite a few C compilers have scoped
functions capability implemented as a C extension.

It's not quite that simple. References to objects defined in a
containing function are non-trivial. There are at least two common ways
to implement this:

- A "static link", a pointer in a nested function to
the frame for its parent, forming a linked list for multiple
nesting levels); or

- A "display", an array of pointers to the frames for all lexically
enclosing functions.

In either case, the compiler has to generate extra code to maintain
these pointers. (No such extra code is needed for a program that
doesn't have nested functions, or possibly if it does but they don't
refer to parent declarations).

And C function pointers make it possible to call a nested function when
its parent is not executing. If the nested function refers to
declarations in its parent, presumably the behavior is undefine (or, as
the gcc documentation puts it, "all hell will break loose").

An example:

#include <stdio.h>

void (*funcptr)(void);

static void outer(void) {
int outer_var = 42;
void inner(void) {
printf("in inner, outer_var = %d\n", outer_var);
}
funcptr = inner;
puts("outer");
inner();
}

int main(void) {
outer();
funcptr();
}

When I run this program on my system, the output is:

outer
in inner, outer_var = 42
in inner, outer_var = 42

but only by luck; the indirect call has undefined behavior since
outer_var doesn't exist at that point.
 
C

CHIN Dihedral

Making them *default* is stupid; the default should be zero based.

What's stupid is having the 1st, 2nd, 3rd and 4th elements of an array
indexed as 0, 1, 2 and 3.
I find arguments of the form "This is stupid!" "No, that's stupid!"
intensely boring.



Ha. I tend to agree with you.



I would add "disgusting" to the list as well. While the OP's post

generated fun and amusement for a few, over-all I found his whine list

rather silly, with the possible exception of Nested Functions*.



If those features really bother the OP, then Visual Basic 6.0 should

suit him very well.



-ralph

[*Not sure about the value of nested functions. Never used them

myself. The few examples I've run across give the appearance of being

useful - but limited to narrow solutions and always seem slightly

contrived. But that's just my opinion, thus quite safely ignored. <g>]

I remember in the old 1-based VB days that a string longer than 64K will
result in undefined behaviors.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top