What Defins the "C" Language?

R

Randy Yates

Chris Hills said:
Randy Yates said:
In Harbison and Steele's text (fourth edition, p.111)
it is stated,

The C language does not specify the range of integers that the
integral types will represent, except ot say that type int may not
be smaller than short and long may not be smaller than int.

They go on to say,

Many implementations represent characters in 8 bits, type short in
16 bits, and type long in 32 bits, with type int using either 16 or
32 bits depending on the implementation. ISO C requires
implementations to use at least these widths.

If the C language is not defined by ISO C, then what defines it?

In reality the answer is quite simple. The users define the language....

That is to say various organisations produce compilers. These are used
by the SW community. Some become dominant in certain fields. ie PC, MAC,
Unix, 8051, AVR, PIC. These dominant compilers are effectively "the
standard" for that area. Most have extensions and some restrictions to
the ISO language .

You will often see the term "NQC" used here. It stands for "Not Quite C"
It refers to the many implementations of C, often for some of the
smaller 8/16 embedded architectures, that are literally Not Quite [ISO]
C but are the "standard" C for that field of work or particular
processor.

When C was originally devised, as commented on in this thread, there
were many machines with sizes that were not 8/16/32. No one was really
sure where things were going so the language was not restricted. Besides
there is not point in gratuitously breaking current code bases.

Thanks, Chris.
As time goes on I expect that 99.999% of new processors will have the
nice 8/16/32/64 bit system but there are many systems out there with a
good few years of life in them that don't conform to this.

True. TI's latest fixed-point, low-power digital signal processor family,
the TMS320C55xx, has a minimum wordlength of 16 bits, so a "byte" (a
sizeof(char)) is 16 bits.
The ISO-C standard is the goal to which the core of all compilers should
aim for. It is also the specification that they should list deviations
from. The fact that a compiler does not conform is usually not important
as long as you know that it does not and of course how it does not. In
many fields portability is not a prime, or even major, consideration.

Yeah, I knew all of this. I had just made the statement the other day
in comp.dsp that an int could be any size when someone stated that
they must be at least 16 bits. Then when I opened my H&S to clarify I
got even more confused. I was pretty sure that in the past (WAY
pre-ISO-C) you could have any damn size int you wanted.
--
% Randy Yates % "Though you ride on the wheels of tomorrow,
%% Fuquay-Varina, NC % you still wander the fields of your
%%% 919-577-9882 % sorrow."
%%%% <[email protected]> % '21st Century Man', *Time*, ELO
http://home.earthlink.net/~yatescr
 
C

Chris Barts

Randy said:
Is my point not clear enough?

Our point is that the ISO C standard doesn't exactly specify a lot of
things. It is a contract between the compiler-writer and the compiler-user.

The compiler-writer promises to meet all of the minimums imposed by the
ISO C standard and adhere to all of the definitions for library
functions and control structures. He essentially promises to make an
implementation of the C virtual machine on whatever hardware platform
his compiler targets.

The compiler-user promises to write conformant, sensible code that the
compiler can translate into useful actions (machine code output,
usually). He essentially promises to not exceed the limitations of the C
virtual machine, as described in the ISO C standard.

(Note that the C virtual machine usually isn't implemented as a virtual
machine in the Java sense. It is, rather, a set of implicit assumptions
that a guaranteed to be valid in any C program.)
 
C

Chris Hills

Keith Thompson <kst- said:
The quoted text was actually written by Doug Gwyn in response to
something posted by "Merrill & Michele". It was in comp.std.c in the
"influences to C89" thread (which "Merrill & Michele" started by
quoting without attribution something *I* wrote here in comp.lang.c).
Thanks for the clarification on attributions.
My apologies to Doug for missing him off

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills Staffs England /\/\/\/\/\
/\/\/ (e-mail address removed) www.phaedsys.org \/\/
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/
 
P

Preben Traerup

True. TI's latest fixed-point, low-power digital signal processor family,
the TMS320C55xx, has a minimum wordlength of 16 bits, so a "byte" (a
sizeof(char)) is 16 bits.
Sure?

I would expect something like
sizeof(char) = 1;
and
CHAR_BIT = 16;
 
M

Mark L Pappin

and Preben Traerup said:
Sure?

I would expect something like
sizeof(char) = 1;
and
CHAR_BIT = 16;

You seem to be under the misapprehension that what you said is at odds
with what you quoted. Granted, you did use terminology within the
language.

mlp
 
R

Randy Yates

Preben Traerup said:
Sure?

I would expect something like
sizeof(char) = 1;
and
CHAR_BIT = 16;

You are correct. My writing was not clear. I should have written "a 'byte' (a char)...".
It is always true that sizeof(char) = 1 on any platform.
 
D

Dan Pop

In said:
You seem to be under the misapprehension that what you said is at odds
with what you quoted.

No such misapprehension: sizeof(char) cannot be simultaneously 16 and 1,
so the two statements are at odds.
Granted, you did use terminology within the language.

Not really. His semicolons suggest valid C statements, but they aren't
valid C statements.

Dan
 
D

Dan Pop

In said:
There have been enough that called non-8-bit units "bytes" that the term
"octet" is preferred in uses where the distinction is critical

Can we have some concrete examples? AFAIK, the term "byte" was made
popular by the IBM 360, an 8-bit byte machine. It wouldn't make much
sense in a word-addressed machine, anyway.
(e.g. networking).

"Byte", being defined as a computer memory unit, wouldn't make much sense
in a networking context in the first place.
8-bit bytes have become so dominant that "byte" and "octet" are often used
interchangeably,

Not in English, as far as I can tell. The term "byte" is still used in a
computer memory context, while "octet" is reserved for networking
contexts. The French have a strong preference for "octet", because of
their rampant NIH syndrome.
but that's obviously not accurate on some systems that most
folks would not consider "mainstream" or "normal".

Systems without a native definition of "byte" in the first place.
It's the definition of byte provided by C for those systems that you're
talking about. The redefinition of "byte" in the C standard was a gross
mistake: the standard doesn't really need it, "character" is good enough
for its purposes and the term "byte" already had a well established
meaning in computing.

Dan
 
J

Joona I Palaste

No such misapprehension: sizeof(char) cannot be simultaneously 16 and 1,
so the two statements are at odds.
Not really. His semicolons suggest valid C statements, but they aren't
valid C statements.

"Somebody" obviously doesn't understand C syntax, but his point is
easily understood, and correct. sizeof(char) is always 1, no matter the
circumstances. CHAR_BIT is the number of bits in a C byte. A C byte
does not have to have 8 bits, it can have more, but it can't have less.
 
S

Stephen Sprunk

Dan Pop said:
Can we have some concrete examples? AFAIK, the term "byte" was made
popular by the IBM 360, an 8-bit byte machine. It wouldn't make much
sense in a word-addressed machine, anyway.


"Byte", being defined as a computer memory unit, wouldn't make much sense
in a networking context in the first place.

IBM's ESCON uses 10-bit bytes to this day; I don't know if that qualifies as
networking (I was referring mainly to TCP/IP) since that's just a connection
between two mainframes or between a mainframe and a FEP...

Also, back when TCP/IP was originally specified, there were many machines
expected to run the protocol that did not natively use 8-bit memory units; I
wasn't around at the time so I don't know first-hand if they used "byte" to
describe 9-bit (or other sized) memory locations, but there was apparently
enough (mis)use of the word to justify inventing another one. Discussions
here and on comp.arch lead me to believe that non-8-bit units were (and
still are) commonly called bytes on some architectures.
Not in English, as far as I can tell. The term "byte" is still used in a
computer memory context, while "octet" is reserved for networking
contexts.

I would think that, prior to widespread networking, there was no need for
"octets" because data residing on one machine was rarely (compared to today)
moved to another machine, particularly one with a different size. If
machines had different byte (or whatever) sizes, it was largely irrelevant.
Now that networks are ubiquitous, nearly all mainstream machines use 8-bit
bytes and so octet has fallen back out of use.

S
 
K

Keith Thompson

It's the definition of byte provided by C for those systems that you're
talking about. The redefinition of "byte" in the C standard was a gross
mistake: the standard doesn't really need it, "character" is good enough
for its purposes and the term "byte" already had a well established
meaning in computing.

In my opinion, there were two errors: the redefinition of "byte", and
the close association of bytes and characters.

It would have been better to avoid the word "byte" altogether, use
something like "storage unit" to refer to the fundamental unit of
addressible memory, and not require a character to be one byte.

I think it also would have been good to allow arrays of quantities
smaller than a storage unit (such as arrays of 8-bit characters on
systems that can only directly address 64-bit words, and bit arrays on
all systems). This would require some extra work for the compiler,
and some would argue that it wouldn't be in the "spirit of C".

But I'm just daydreaming; none of this is likely to happen in any
future language with the name "C".
 
K

Keith Thompson

Preben Traerup said:
Sure?

I would expect something like
sizeof(char) = 1;
and
CHAR_BIT = 16;

Strictly speaking, sizeof(char) == 1 by definition, but it's not
entirely unreasonable to say informally that "sizeof(char) is 16
bits". It's a matter of expressing the same quantity in different
units. (My argument is weakened by the fact that the result of the
sizeof operator is always in bytes, but the meaning is clear enough.)
 
D

Dan Pop

In said:
(e-mail address removed) (Dan Pop) writes:
[...]
It's the definition of byte provided by C for those systems that you're
talking about. The redefinition of "byte" in the C standard was a gross
mistake: the standard doesn't really need it, "character" is good enough
for its purposes and the term "byte" already had a well established
meaning in computing.

In my opinion, there were two errors: the redefinition of "byte", and
the close association of bytes and characters.

My point was that "byte" was not needed at all by the C standard, since
"character" could have been used instead.
It would have been better to avoid the word "byte" altogether, use
something like "storage unit" to refer to the fundamental unit of
addressible memory, and not require a character to be one byte.

You don't need to bother at all with the fundamental unit of addressable
memory in the C standard, if all the types have sizes that are multiples
of char, so sizeof(char) == 1 still holds.

OTOH, in a low level language like C, it would be unreasonable not to give
access to the programmer to the fundamental unit of addressable memory...
After all, C itself was designed to port Unix from a word-addressed
architecture (the PDP-7) to a byte-addressed architecture (the PDP-11).
I think it also would have been good to allow arrays of quantities
smaller than a storage unit (such as arrays of 8-bit characters on
systems that can only directly address 64-bit words,

The implementor can always support that and we both know one concrete
example.
and bit arrays on all systems).

To have that, you need the type "bit". Introducing this type involves
a redesign of the sizeof operator. And we both know it is too late for
that...
This would require some extra work for the compiler,
and some would argue that it wouldn't be in the "spirit of C".

It's no more work than bit-fields require, so the spirit of C can't be
invoked. The nasty problem is the sizeof operator, that was not designed
with bits in mind.

Dan
 
D

Dan Pop

In said:
Also, back when TCP/IP was originally specified, there were many machines
expected to run the protocol that did not natively use 8-bit memory units; I
wasn't around at the time so I don't know first-hand if they used "byte" to
describe 9-bit (or other sized) memory locations,

There were NO 9-bit memory locations and, hence, no 9-bit "bytes". Those
machines used 36-bit memory locations (they were word-addressed machines,
not byte-addressed machines) and 36-bit general purpose registers. It was
only the data in a GPR that could be interpreted as a sequence of
characters of fixed or variable size.

Other word sizes that were popular among non-byte addressable machines
are 12, 18 and 60 (for supercomputers). Cray-1 (and its successors) was
some kind of a weirdo with 64-bit addressable words, but this proved to
be quite fortunate for C implementors.
but there was apparently
enough (mis)use of the word to justify inventing another one. Discussions
here and on comp.arch lead me to believe that non-8-bit units were (and
still are) commonly called bytes on some architectures.

The point is that there were no non-8-bit units of memory; before the
IBM 360 and the PDP-11 the unit of memory was the word which was
indivisible (atomic) as far as the memory system was concerned.

This is why I asked about concrete examples of machines using non-octet
bytes.
I would think that, prior to widespread networking, there was no need for
"octets" because data residing on one machine was rarely (compared to today)
moved to another machine, particularly one with a different size. If
machines had different byte (or whatever) sizes, it was largely irrelevant.
Now that networks are ubiquitous, nearly all mainstream machines use 8-bit
bytes and so octet has fallen back out of use.

The 8-bit byte was imposed by the success of of hardware architectures
like IBM 360, PDP-11 and i8080A long before TCP/IP networking became
widespread. It's more reasonable to assume that TCP/IP networking adopted
octets because most machines that were going to be connected already had
8-bit bytes. The use of the term octet was justified by the fact that
some of the machines didn't have any kind of (well defined) byte, the
most important example back then being the PDP-10 machine (36-bit words
which could be divided, in software, in a sequence of variable size
characters).

Dan
 
R

Richard Tobin

Other word sizes that were popular among non-byte addressable machines
are 12, 18 and 60 (for supercomputers).

And 24. The ICL 1900 series and compatibles used that. It was
typically used with a 6-bit character set, 4 to a word.

-- Richard
 
K

Keith Thompson

In said:
(e-mail address removed) (Dan Pop) writes:
[...]
It's the definition of byte provided by C for those systems that you're
talking about. The redefinition of "byte" in the C standard was a gross
mistake: the standard doesn't really need it, "character" is good enough
for its purposes and the term "byte" already had a well established
meaning in computing.

In my opinion, there were two errors: the redefinition of "byte", and
the close association of bytes and characters.

My point was that "byte" was not needed at all by the C standard, since
"character" could have been used instead.
It would have been better to avoid the word "byte" altogether, use
something like "storage unit" to refer to the fundamental unit of
addressible memory, and not require a character to be one byte.

You don't need to bother at all with the fundamental unit of addressable
memory in the C standard, if all the types have sizes that are multiples
of char, so sizeof(char) == 1 still holds.

You're still assuming that the fundamental unit of addressable memory
has to be a character. C requires that, of course, and that's not
going to change, but I'm suggesting that it would have been better to
keep them separate.

If I were designing C from scratch, or even if I were working on the
C89 standard (via time machine), I might propose something like this:

1. Drop the word "byte" altogether.

2. Define a type "storage_unit" (or pick a shorter name if you like)
as the fundamental addressible memory unit. On typical systems, a
storage_unit is an 8-bit unsigned type with no padding bits; on
others, it might be bigger, or even smaller. The mem*() functions
work on arrays of storage_unit. The type "storage_unit*" might even
replace "void*"; malloc() could return a "storage_unit*". Add
implicit conversions to taste (or don't).

3. Define the type "char" independently of the "storage_unit" type.
sizeof(char) needn't be 1. Strings are arrays of char. (I'm glossing
over issues of signed vs. unsigned, and of 8-bit vs. 16-bit vs. 32-bit
character sets. We might eventually want "long char" rather than
"wchar_t".)

4. Since the natural size of a storage_unit might be bigger than the
natural size of a character, we need some way to deal with arrays of
sub-storage_unit elements. Perhaps some arrays could be explicitly
packed, as in Pascal and Ada. Packed structs could eliminate the need
for explicit bit fields. The sizeof operator might need to yield the
size in bits rather than in storage units.

I think Doug Gwyn pushed for decoupling type "char" from the
fundamental addressing unit while the original ANSI C standard was
being developed; the rest of the committee didn't go for it.

And yes, it's far too late to do any of this, at least for any
language named "C" (and there are already too many languages named
"D").
 
L

Lawrence Kirby

OTOH, if you reply: "it shouldn't be smaller than 12x16", do you call this
a "specification" or a "constraint"? The C standard does not specify the
ranges of the integer types, it only imposes certain constraints on them.

A specification rarely if ever specifies everything exactly. A plan might
specify dimensions as particular values but not precise materials,
construction methods etc. A specification is made up of a collection of
constraints, some may be exact others not. Is the C language standard as
a whole a specification?

So the answer to your question is probably "both".

Lawrence
 
D

Dan Pop

In said:
And 24. The ICL 1900 series and compatibles used that. It was
typically used with a 6-bit character set, 4 to a word.

I was mentioning the popular sizes, rather than attempting an exhaustive
enumeration. Many other sizes have been used by one architecture or
another, especially during the late fifties and the early sixties.

For example, the first electronic computer used at CERN, a Ferranti
machine, had 40-bit words. Later, they got a 48-bit CDC machine.

However, 12 and 18-bit words were the most popular among the early minis
36 for "full size" computers and 60 for supercomputers. As these numbers
suggest, all these machines were designed with a 6-bit character set in
mind (usually, the BCD character set).

Dan
 
D

Dan Pop

In said:
[email protected] (Dan Pop) said:
In said:
(e-mail address removed) (Dan Pop) writes:
[...]
It's the definition of byte provided by C for those systems that you're
talking about. The redefinition of "byte" in the C standard was a gross
mistake: the standard doesn't really need it, "character" is good enough
for its purposes and the term "byte" already had a well established
meaning in computing.

In my opinion, there were two errors: the redefinition of "byte", and
the close association of bytes and characters.

My point was that "byte" was not needed at all by the C standard, since
"character" could have been used instead.
It would have been better to avoid the word "byte" altogether, use
something like "storage unit" to refer to the fundamental unit of
addressible memory, and not require a character to be one byte.

You don't need to bother at all with the fundamental unit of addressable
memory in the C standard, if all the types have sizes that are multiples ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
of char, so sizeof(char) == 1 still holds.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You're still assuming that the fundamental unit of addressable memory
has to be a character.

Nope, I don't. I'm merely assuming what I have underlined above.

If you still don't get it, consider an implementation on x86 where
the types char and short are 16-bit, int, long and float 32-bit,
double 64-bit and long double 80-bit.

On this implementation, sizeof(int) == 2 and sizeof(double) == 4 and
any pointer value with the least significant bit set is invalid.

Perfectly usable for hosted applications, maybe not so handy for
freestanding ones (e.g. OS kernels and device drivers).

Dan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,156
Messages
2,570,878
Members
47,413
Latest member
KeiraLight

Latest Threads

Top