disgusting compiler !! hahaha!!

G

glen herrmannsfeldt

Kaz Kylheku said:
High leve languages also have zero based arrays.
Zero based arrays are also described in Lisp 2 (1968), and Lisp
continues to have zero-based vectors today.
Python has zero-based arrays.
One based arrays are useful in some situations. Those situations
are rare.
*Supporting* one-based arrays isn't a bad idea.
Making them *default* is stupid; the default should be zero based.
Making one-based arrays the only choice is criminally insane.

So why did Fortran do that in 1956, and standardize it in 1966?

The first might be related to the index registers, but the latter
is harder to explain. It wasn't fixed until 1977.

-- glen
 
D

David Brown

Ben Bacarisse said:
glen herrmannsfeldt said:
(snip, I wrote)
Computers like to index from zero, but it isn't hard at all
for a compiler to subtract one. Converting algorithms between
zero based and one based indexing is harder than it seems it
should be.

Why not have a language allow both 0-based and 1-based?
Sometimes 0-based is useful (for measuring or for use with
offsets), sometimes 1-based is (for counting); and sometimes
N-based is. It's not hard.

Then no conversion is necessary.

After writing that, I thought that one could have a [[ ]] operator,
for a subtract one and index. As far as I know, there is no
current use for that syntax that would cause an ambiguity.

You'd probably have to add the syntax as a grammar rule, keeping [ and ]
as the only tokens. Making it an actual operator (which to me implies
that the [[ and the ]] are tokens) complicates the lexer since it can't
apply the "maximal-munch" rule anymore.

C++ had a similar problem with << and >>. In nested template
definitions, a closing >> had to be written as > > so it wouldn't be
interpreted as a right shift operator. A more recent version of the C++
standard corrected the problem (I don't remember exactly how).

And C++ now uses [[ ]] for attribute syntax, such as for marking
parameters "unused" or marking functions with "noreturn".

<http://en.wikipedia.org/wiki/C++11#Attributes>

Since attributes could work just as well in C as in C++ (they don't need
any C++ specific features), it is not unlikely that this feature will
get "back-ported" to C, at least as extensions in some compilers.

And while the C++ people don't seem to mind using the same keywords or
symbols for completely different purposes, even when the parsing can be
ambiguous, C people are less keen. So overall, I think that rules out
[[ ]] as a choice of operators here. (Not that I think C would benefit
from a 1-based index operator, but that's beside the point.)
 
K

Kaz Kylheku

So why did Fortran do that in 1956, and standardize it in 1966?

Fortran was implemented by people who had no idea what they were doing,
and had nobody to learn it from.

They had no reason to suspect that one-based arrays suck for programming.

They made other mistakes, like removing all whitespace before tokenizing and
parsing, thinking it might be a good idea.

Today, we really should scrub the word Fortran from our vocabularies.
 
B

BartC

glen herrmannsfeldt said:
(snip, I wrote)
Why not have a language allow both 0-based and 1-based?
Sometimes 0-based is useful (for measuring or for use with
offsets), sometimes 1-based is (for counting); and sometimes
N-based is. It's not hard.
Then no conversion is necessary.

After writing that, I thought that one could have a [[ ]] operator,
for a subtract one and index. As far as I know, there is no
current use for that syntax that would cause an ambiguity.

So [[x]] just means [(x)-1] ?

The latter is probably better: no language or compile upgrades are needed,
it will work in every C system as it is; it tells you exactly what it means
compared with the more cryptic [[x]] (which has other meanings elsewhere);
and no true 1-based indexing syntax uses [[x]]; they will use [x].

And most of the time it can also be written as [x-1] (although it will be
confusing when the index is already x-1 or x+1, giving an index of x-2 or
just x).

The idea however is just to be able to write [x] like everywhere else.
 
K

Kaz Kylheku

After writing that, I thought that one could have a [[ ]] operator,
for a subtract one and index. As far as I know, there is no
current use for that syntax that would cause an ambiguity.

[[:alpha:]]

*ducks*
 
B

BartC

Kaz Kylheku said:
High leve languages also have zero based arrays.

Usually from contamination with C. If it has curly braces, or the
implementation is based on a curly-braced language, the probability is that
the arrays are 0-based.

Higher-level languages have less reason start to counting from zero which is
less intuitive. Most counting in real-life is 1-based.
Zero based arrays are also described in Lisp 2 (1968), and Lisp
continues to have zero-based vectors today.

I would be surprised if Lisp couldn't support N-based arrays if it wanted
to; if not then it would be the only thing it couldn't do.
Python has zero-based arrays.

Lua and Ada are 1-based. The latter also has N-based.
One based arrays are useful in some situations. Those situations
are rare.

*Supporting* one-based arrays isn't a bad idea.

Making them *default* is stupid; the default should be zero based.

What's stupid is having the 1st, 2nd, 3rd and 4th elements of an array
indexed as 0, 1, 2 and 3.
Making one-based arrays the only choice is criminally insane.

The same isn't true of zero based arrays. Zero based arrays being
the only supported representation is perfectly workable.

If the choice is *only* between 0 and 1, then 0 is more versatile. But it's
not hard to allow both or any. Both 0 and 1 bases have their uses; 1-based I
think is more useful as the default base.

But imagine you had a language feature that looked like this:

x = ( n | a, b, c, ... | z);

which selects one of a, b, c etc depending on n (with z being a default
value); ie. it selects the nth value.

Should n be 1-based (so n=1 selects a), or 0-based (n=1 selects b); which is
more natural?
 
D

David Brown

Usually from contamination with C. If it has curly braces, or the
implementation is based on a curly-braced language, the probability is that
the arrays are 0-based.

Higher-level languages have less reason start to counting from zero
which is
less intuitive. Most counting in real-life is 1-based.


I would be surprised if Lisp couldn't support N-based arrays if it wanted
to; if not then it would be the only thing it couldn't do.


Lua and Ada are 1-based. The latter also has N-based.


What's stupid is having the 1st, 2nd, 3rd and 4th elements of an array
indexed as 0, 1, 2 and 3.


If the choice is *only* between 0 and 1, then 0 is more versatile. But it's
not hard to allow both or any. Both 0 and 1 bases have their uses;
1-based I
think is more useful as the default base.

But imagine you had a language feature that looked like this:

x = ( n | a, b, c, ... | z);

which selects one of a, b, c etc depending on n (with z being a default
value); ie. it selects the nth value.

Should n be 1-based (so n=1 selects a), or 0-based (n=1 selects b);
which is
more natural?

When you just going to have one type of array, then indexing by integers
starting from 0 is the simplest and clearest - you have the offset from
the start of the array.

If you are going to look for more options, then you should really allow
ranges and different integral types (such as different sized integers,
integer ranges, contiguous enumerated types, etc.). That would lead to
clearer code and better compile-time checking for ranges and types -
similar to Pascal:

int xs[1 .. 100];
char rotationCypher['a' .. 'z'];

I don't see it happening in the C world - especially as it is already
possible in C++ (but with an uglier template declaration syntax, of course).
 
B

BartC

David Brown said:
On 08/05/14 11:15, BartC wrote:
When you just going to have one type of array, then indexing by integers
starting from 0 is the simplest and clearest - you have the offset from
the start of the array.

But it doesn't always make sense to keep thinking of offsets. Look at the
'A'..'Z' example below, and tell me where offsets from the start of the
array come in, on the line with rotationcypher['F'].
If you are going to look for more options, then you should really allow
ranges and different integral types (such as different sized integers,
integer ranges, contiguous enumerated types, etc.). That would lead to
clearer code and better compile-time checking for ranges and types -
similar to Pascal:

int xs[1 .. 100];
char rotationCypher['a' .. 'z'];

I don't see it happening in the C world - especially as it is already
possible in C++ (but with an uglier template declaration syntax, of
course).

Why can't we just have those 1..100 and 'a'..'z' bounds without bringing all
that other stuff into it?

I can write this [** not in C **]:

['A'..'Z']char rotationcypher
print rotationcypher['F']

Which I can machine-translate to C and it takes care of the offsets needed
to make it work with C's 0-based arrays. Something like this:

unsigned char rotationcypher[26];
printf("%c",(rotationcypher[70-65]));

No special type attributes for the index or bounds, no templates, nothing
special except providing the right offsets.

Clearly such bounds are useful and can make for more readable code; why do
want to deny this to C programmers?
 
D

David Brown

David Brown said:
On 08/05/14 11:15, BartC wrote:
When you just going to have one type of array, then indexing by integers
starting from 0 is the simplest and clearest - you have the offset from
the start of the array.

But it doesn't always make sense to keep thinking of offsets. Look at the
'A'..'Z' example below, and tell me where offsets from the start of the
array come in, on the line with rotationcypher['F'].

The point is that when you use 0-based arrays, you can think of offsets.

For non-zero based arrays, the compiler hides the offset from you. In
the 'a' .. 'z', the compiler would put the " -'a' " in for you.
If you are going to look for more options, then you should really allow
ranges and different integral types (such as different sized integers,
integer ranges, contiguous enumerated types, etc.). That would lead to
clearer code and better compile-time checking for ranges and types -
similar to Pascal:

int xs[1 .. 100];
char rotationCypher['a' .. 'z'];

I don't see it happening in the C world - especially as it is already
possible in C++ (but with an uglier template declaration syntax, of
course).

Why can't we just have those 1..100 and 'a'..'z' bounds without bringing
all
that other stuff into it?

I can write this [** not in C **]:

['A'..'Z']char rotationcypher

That format is pointlessly different from C style - if these sorts of
arrays are ever to make sense in C, C++, or a C-like language, then they
would be written in roughly the form I gave.
print rotationcypher['F']

Which I can machine-translate to C and it takes care of the offsets needed
to make it work with C's 0-based arrays. Something like this:

unsigned char rotationcypher[26];
printf("%c",(rotationcypher[70-65]));

No special type attributes for the index or bounds, no templates, nothing
special except providing the right offsets.

Clearly such bounds are useful and can make for more readable code; why do
want to deny this to C programmers?

I can't make sense of you here. I said specifically that arrays with
ranges or enumerated types as indexes "would lead to clearer code and
better compile-time checking for ranges and types". I am not "trying to
deny this to C programmers" - I think it would be a useful enhancement
to the language, which is why I wrote about it. But I also think that
it is unlikely to become a part of C, especially as you can implement it
in C++.
 
S

Stefan Ram

David Brown said:
For non-zero based arrays, the compiler hides the offset from you. In
the 'a' .. 'z', the compiler would put the " -'a' " in for you.

If intermediate pointers to unallocated memory were not a problem,
we could have any offset we like in C:

int a_[] ={ 1, 2, 3 }; int * a = a_ - 1; /* a now is 1-based */

. This might cause UB, but should work in many environments.
 
W

Walter Banks

glen said:
Again, it doesn't make much difference to compilers.

Well, overall, C is a fairly simple language to compile.
This is just one feature.

I have written a few compilers for several languages. C
compilers are actually fairly complex compared to many
other languages a lot because the language has evolved
over time and so much of it is compiler defined but
constrained by conventional wisdom.

w..
 
W

Walter Banks

jacob said:
Le 07/05/2014 20:37, Richard a écrit :

The APL Language had a global variable called "Origin" that could be
zero or one. According to this value array would start at 1 (default) or
at zero (if you set Origin to zero).

This gave users the choice, but led to subtle bugs. It would suffice to
forget the origin change somewhere and all your software would no longer
run since if you wrote it using origin 1 and somebody set the origin to
zero all your array accesses would be wrong.

The nice thing with origin 1 is that since array inderf zero doesn't
exist, many functions can return zero for saying "Search failed". With
origin zero you must return some other flag value (like 0xfffffff) or
whatever, what always provokes problems.

Pascal can define index ranges when an array is declared. In code
generation the differences are trivial but it often makes applications
very readable.

w..
 
W

Walter Banks

BartC said:
There are innumerable benefits from being flexible in specifying the lower
bound of an array.

One being that it makes it easier to port code or an algorithm that uses a
different base.


C conflates arrays with pointers too much which is why you're thinking of
offsets when you should be thinking of indices.

C implementation of arrays has many choices for some processors
conflates arrays to pointers is a good approach for others the ISA is
more effective in accessing arrays is to manage the indices.

w..
 
W

Walter Banks

James said:
I suspect that decision was based upon a lot of experience showing that
single-precision was often insufficiently accurate.

There is a lot of evidence that the added 4 bits a 36 bit float (as opposed
to a 32 bit single precession) would change a lot of the prcession problems
for most applications.

Something that has been missed in many processor designs is the effect of data
widths on the ability of a processor to be used in applications. 2^^N widths
have a minimum advantage in hardware implementation. I have worked on
several processors designs where a non standard data width contributed
substantially to application throughput.

w..
 
B

BartC

David Brown said:
David Brown said:
On 08/05/14 11:15, BartC wrote:
If the choice is *only* between 0 and 1, then 0 is more versatile. But
it's
not hard to allow both or any. Both 0 and 1 bases have their uses;
1-based I
think is more useful as the default base.
When you just going to have one type of array, then indexing by integers
starting from 0 is the simplest and clearest - you have the offset from
the start of the array.

But it doesn't always make sense to keep thinking of offsets. Look at the
'A'..'Z' example below, and tell me where offsets from the start of the
array come in, on the line with rotationcypher['F'].

The point is that when you use 0-based arrays, you can think of offsets.

For non-zero based arrays, the compiler hides the offset from you. In
the 'a' .. 'z', the compiler would put the " -'a' " in for you.

Exactly. So why shouldn't there be the option for a lower bound that isn't
zero? As you say, it's not that difficult for a compiler to deal with it.
I can write this [** not in C **]:

['A'..'Z']char rotationcypher

That format is pointlessly different from C style - if these sorts of
arrays are ever to make sense in C, C++, or a C-like language, then they
would be written in roughly the form I gave.

I said that was not C. I wrote it that way because it is *an actual working
example* of a language that does C-like things yet allows any array lower
bound. And that C output was actual output (with some types adjusted to make
it clearer).

In C you'd put the dimensions where it normally expects them (I don't know
what syntax would be proposed. I use either [lower..upper] or
[lower:length] or [length] or [lower:].)
I can't make sense of you here. I said specifically that arrays with
ranges or enumerated types as indexes "would lead to clearer code and
better compile-time checking for ranges and types". I am not "trying to
deny this to C programmers" - I think it would be a useful enhancement
to the language, which is why I wrote about it.

I think that simply having a choice of lower bound would also be a useful
enhancement, and one considerably simpler to implement than turning C into
Pascal or Ada at the same time, with all these type checks (which are much
more difficult than you might think when you do it properly).

The compiler can do range checking it if likes. But at present, an array
like char a[100] already has a range of 0..99, and few compilers seem to
bother checking it! (A quick test shows only 2 out of 6 checking it at
normal warning levels. gcc doesn't seem to be it any any level, but
doubtless there will be an option hidden away to do so.)
But I also think that
it is unlikely to become a part of C, especially as you can implement it
in C++.

A decision to switch languages is not really an answer! (And it sounds like
C++ only manages it with some trouble.)

Anyway not everyone likes to drag in the complexity of a C++ compiler just
for one or two extra features which ought to be in C already.
 
K

Kaz Kylheku

Usually from contamination with C. If it has curly braces, or the
implementation is based on a curly-braced language, the probability is that
the arrays are 0-based.

Higher-level languages have less reason start to counting from zero which is
less intuitive. Most counting in real-life is 1-based.

1. Indexing isn't necessarily counting!

[ ] [ ] [ ]
^ ^ ^ ^
0 1 2 3

The array index is the left corner of the box. The count is the right corner.

The index indicates: what is the displacement? How many elements are
before this one?

2. Counting is not one based. It is zero based. To count items, you must
start with zero:

count = 0

while (uncounted items remain) {
check off next item
count ++
}

if there are no items, the count is zero. With the indexing
diagram, again, counting works like this:

step 0: initialize

[ ] [ ]
^
0

step 1: first box is counted

[ ] [ ]
^ ^
0 1

step 2: second box is counted

[ ] [ ]
^ ^ ^
0 1 2

C certainly uses one-based counting for array length: an array
which contains only element [0] has length 1.
I would be surprised if Lisp couldn't support N-based arrays if it wanted
to; if not then it would be the only thing it couldn't do.

Indeed, ANSI Common Lisp has displaced arrays: array objects which virtually
reference the data in other arrays, with displacement.

Those were not there in the beginning; just zero-based arrays.
Lua and Ada are 1-based. The latter also has N-based.


What's stupid is having the 1st, 2nd, 3rd and 4th elements of an array
indexed as 0, 1, 2 and 3.

Not at all. Index 0 indicats that the array is empty b efore we push
the first element there.

These concepts can coexist in a language. Lisp:

(elt '(a b c) 0) -> a

(first '(a b c)) -> a

The symbols first, second, ... are never subject to scaling or
displacement, and correspond to natural language concepts.

Note that clocks measure the day from 00:00 to 23:59, not from 01:01
to 24:60. People generally do not have a problem with this.

Also, countdown timers go to zero. If you cook something with your
microwave for 27 seconds, it starts at 27, and counts down to zero,
once per second.

When year 2000 rolled around, numerous people around the world
thought that it's the start of the new millennium and celebrated.

Those pointing out that it actually ranges from 2001 to 3001 were ridiculed as
dweebs and party poopers.

"Ordinary people" can, and do, regard zero based systems as natural,
while at the same time regarding one based counting as natural also,
depending on context.
If the choice is *only* between 0 and 1, then 0 is more versatile. But it's
not hard to allow both or any. Both 0 and 1 bases have their uses; 1-based I
think is more useful as the default base.

So "useful" and "versatile" are opposites, of sorts.
But imagine you had a language feature that looked like this:

x = ( n | a, b, c, ... | z);

which selects one of a, b, c etc depending on n (with z being a default
value); ie. it selects the nth value.

Should n be 1-based (so n=1 selects a), or 0-based (n=1 selects b); which is
more natural?

Zero all the way, without a question.

For instance, suppose I want to regard that list as pairs. I want to select
either (a, b) or (c, d) based on n. It's easy: just take elements 2*n,
and 2*n+1. If n is 1 based, I have to do algebra: 2*(n-1) and 2*(n-1)+1,
which goes to 2n-2 and 2n-2+1 = 2n-1.

Indexing multi-dimensionally gets even more retarded.
 
K

Kaz Kylheku

Which I can machine-translate to C and it takes care of the offsets needed
to make it work with C's 0-based arrays. Something like this:

unsigned char rotationcypher[26];
printf("%c",(rotationcypher[70-65]));

No special type attributes for the index or bounds, no templates, nothing
special except providing the right offsets.

Clearly such bounds are useful and can make for more readable code; why do
want to deny this to C programmers?

Because one would hope that C programmers can work it out to:

unsigned char cipher['Z' - 'A' + 1]

printf("%c", cipher['F' - 'A']);
 
K

Keith Thompson

BartC said:
Lua and Ada are 1-based. The latter also has N-based.

Ada arrays are not 1-based; you always have to define both lower and
upper bounds. (The predefined type String happens to use an index
subtype with a lower bound of 1.)

[...]
What's stupid is having the 1st, 2nd, 3rd and 4th elements of an array
indexed as 0, 1, 2 and 3.

I find arguments of the form "This is stupid!" "No, that's stupid!"
intensely boring.
 
J

Joe Pfeiffer

My first language (not counting BASIC in high school) was Pascal -- you
could pick any integer upper and lower bound you wanted, and it did
range checking. I regard this as doing it Right.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,073
Messages
2,570,539
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top