Making Fatal Hidden Assumptions

Keith Thompson · Mar 13, 2006

S.Tobias said:
[ F'ups set to c.l.c. - please reset if other groups are interested too. ] [ Followup obeyed. ]

In comp.lang.c Keith Thompson said:

Andrew Reilly said:

If the C Standard guarantees that for any array a, &a [-1]
should be valid, should it also guarantee that &a [-1] != NULL

Probably, since NULL has been given the guarantee that it's unique in some
sense. In an embedded environment, or assembly language, the construct

Click to expand...

...

How exactly do you get from NULL (more precisely, a null pointer
value) being "unique in some sense" to a guarantee that &a[-1], which
doesn't point to any object, is unequal to NULL?

The standard guarantees that a null pointer "is guaranteed to compare
unequal to a pointer to any object or function". &a[-1] is not a
pointer to any object or function, so the standard doesn't guarantee
that &a[-1] != NULL.

Click to expand...

...

By the same logic, ptr to just past the end of an array (which, of course,
does not point to an object), _can_ compare equal to null ptr? ;-)
- Of course not, semantics for "==" guarantee that both ptrs must be
null (among others) to compare equal.

Interesting. I think you may have found a small flaw in the standard.

I'm reasonably sure that it's *intended* that a one-past-the-end
pointer is unequal to NULL, but I don't think the standard quite
manages to say so.

Consider an implementation in which pointers look like unsigned 32-bit
integers, and a null pointer is represented as 0xffffffff. Suppose a
15-byte object (char foo[15]) happens to be allocated at 0xfffffff0.
Then a pointer just past the end of foo (which is not a pointer to any
object) would have the value 0xffffffff, and would happen to be equal
to NULL.

I don't see anything in this scenario that violates the standard.
Probably the definition of "null pointer" should be tightened up a
little, so that a null pointer compares unequal to a pointer to any
object or function, or just past the end of any object.

Keith Thompson · Mar 13, 2006

Ed Prochak said:
C is an assembler because

-- It doesn't impose strict data type checking, especially between
integers and pointers.
(While there has been some discussion about cases where conversions
back and forth between them can fail, for most machines it works. Good
thing too or some OS's would be written in some other language.)

Incorrect. Attempting to assign an integer value to a pointer object,
or vice versa, is a constraint violation, requiring a diagnostic.
Integer and pointers can be converted back and forth only using a cast
(an explicit conversion operator). The result of such a conversion is
implementation-defined.

Even if this were correct, it certainly wouldn't make C an assembler.

-- datatype sizes are dependent on the underlying hardware. While a lot
of modern hardware has formed around the common 8bit char, and
multiples of 16 for int types (and recent C standards have started to
impose these standards), C still supports machines that used 9bit char
and 18bit and 36bit integers. This was the most frustrating thing for
me when I first learned C. It forces precisely some of the hidden
assumptions of this topic.

I don't know what "recent C standards" you're referring to. C
requires CHAR_BIT to be at least 8; it can be larger. short and int
must be at least 16 bits, and long must be at least 32 bits. A
conforming implementation, even with the most current standard, could
have 9-bit char, 18-bit short, 36-bit int, and 72-bit long.

But this is a common feature of many high-level languages. Ada, for
example has an implementation-defined set of integer types, similar to
what C provides; I've never heard anyone claim that Ada is an
assembler.

-- C allows for easy "compilation" in that you could do it in one pass
of the source code (well two counting the preprocessor run). The
original C compiler was written in C so that bootstrapping onto a new
machine required only a simple easily written initial compiler to
compile the real compiler.

You're talking about an implementation, not the language.

-- original versions of the C compiler did not have passes like
data-flow optimizers. So optimization was left to the programmer. Hence
things like x++ and register storage became part of the language.
Perhaps they are not needed now, but dropping these features from the
language will nearly make it a differrent language. I do not know of
any other HLL that has register, but about every assembler allows
access to the registers under programmer control.

Again, you're talking about an implementation, not the language.

C doesn't allow access to specific registers (at least not portably).

Here's what the C standard says about the "register" specifier:

A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast
as possible. The extent to which such suggestions are effective is
implementation-defined.

And there are a few restrictions; for example, you can't take the
address of a register-qualified object.

So IMHO, C is a nice generic assembler. It fits nicely in the narrow
world between hardware and applications. The fact that it is a decent
application development language is a bonus. I like C, I use it often.
Just realize it is a HLL with an assembler side too.

You've given a few examples that purport to demonstrate that C is an
assembler.

Try giving a definition of the word "assembler". If the definition
applies to C (the language, not any particular implementation), I'd
say it's likely to be a poor definition, but I'm willing to be
surprised.

Andrew Reilly · Mar 13, 2006

On the other hand for every machine instruction there should be an
construct in the assembler to get that instruction. With that in
mind C doesn't fit either.

Well, since we're talking about a "universal assembler" (a reasonably
commonly used term), that's obviously something different from the usual
machine-specific assembler, which does indeed usually have that property.
(Although I've met assemblers where the only way to get certain
instructions was to insert the data for the op-code in-line. Instruction
coverage sometimes lags behind features added to actual implementations.)

Dik T. Winter · Mar 14, 2006

>
> Over the years, there have been notable cases of "hidden" machine
> instructions -- undocumented instructions, quite possibly with no
> assembler construct (at least not in any publically available
> assembler.)

Indeed. But even when we look at the published instructions C falls
short of providing a construct for every one. Where is the C construct
to do a multply step available in quite a few early RISC machines?
Note also that in assembler you can access the special bits indicating
overflow and whatever (if they are available on the machine). How to
do that in C?

Dik T. Winter · Mar 14, 2006

> C is an assembler because
>
> -- It doesn't impose strict data type checking, especially between
> integers and pointers.
> (While there has been some discussion about cases where conversions
> back and forth between them can fail, for most machines it works. Good
> thing too or some OS's would be written in some other language.)

It does impose restrictions. You have to put in a cast.

> -- datatype sizes are dependent on the underlying hardware. While a lot
> of modern hardware has formed around the common 8bit char, and
> multiples of 16 for int types (and recent C standards have started to
> impose these standards), C still supports machines that used 9bit char
> and 18bit and 36bit integers. This was the most frustrating thing for
> me when I first learned C. It forces precisely some of the hidden
> assumptions of this topic.

The same is true for Pascal.

> -- C allows for easy "compilation" in that you could do it in one pass
> of the source code (well two counting the preprocessor run). The
> original C compiler was written in C so that bootstrapping onto a new
> machine required only a simple easily written initial compiler to
> compile the real compiler.

What is valid for C is also valid for Pascal. But not all is valid.
Code generation, for instance, is *not* part of the compiler. Without
a back-end that translates the code generated by the compiler to actual
machine instructions, you are still nowhere. So your simple easily written
initial compiler was not an easily written initial compiler. You have
to consider what you use as assembler as backend. In contrast the very
first Pascal compiler was really single-pass and generated machine code
on-the-fly, without backend. Everything ready to run. (BTW, on that
machine the linking stage was almost never pre-performed, it took only
a few milli-seconds.)

> -- original versions of the C compiler did not have passes like
> data-flow optimizers. So optimization was left to the programmer. Hence
> things like x++ and register storage became part of the language.
> Perhaps they are not needed now, but dropping these features from the
> language will nearly make it a differrent language. I do not know of
> any other HLL that has register, but about every assembler allows
> access to the registers under programmer control.

In C "register" is only a suggestion, it is not necessary to follow it.
On the other hand, the very first Pascal compiler already did optimisation,
but not as a separte pass, but as part of the single pass it had. You
could tweek quite a bit with it when you had access to the source of the
compiler.

Dik T. Winter · Mar 14, 2006

> On Mon, 13 Mar 2006 15:31:35 +0000, Dik T. Winter wrote: ....
>
> Well, since we're talking about a "universal assembler" (a reasonably
> commonly used term), that's obviously something different from the usual
> machine-specific assembler, which does indeed usually have that property.
> (Although I've met assemblers where the only way to get certain
> instructions was to insert the data for the op-code in-line. Instruction
> coverage sometimes lags behind features added to actual implementations.)

I have met one such, but not for the reason you think. In this case the
assembler knew the instruction, and translated it, but completely wrong.
Apparently an instruction never used, but nevertheless published. And I
needed it.

But what is (in your opinion) a universal assembler? What properties should
it have to contrast it with a HLL?

Andrew Reilly · Mar 14, 2006

I have met one such, but not for the reason you think. In this case the
assembler knew the instruction, and translated it, but completely wrong.
Apparently an instruction never used, but nevertheless published. And I
needed it.

But what is (in your opinion) a universal assembler? What properties should
it have to contrast it with a HLL?

I posted a page-long description of what I concieve a universal assembler
to be in a previous message in the thread. Perhaps it didn't get to your
news server? Google has it here:
http://groups.google.com/group/comp.lang.c/msg/a91a898c08457481?hl=en&

The main properties that it would have, compared to a C (some other HLLs
do have some of these properties) are:

a) Rigidly defined functionality, without "optimization", except for
instruction scheduling, in support of VLIW or (some) superscaler cores.
(Different ways of expressing a particular algorithm, which perform more
or less efficiently on different architectures should be coded as such,
and selected at compile/configuration time, or built using
meta-programming techniques.) This is opposed to the HLL view which is
something like: express the algorithm in a sufficiently abstract way and
the compiler will figure out an efficient way to code it, perhaps. Yes,
compilers are really quite good at that, now, but that's not really the
point. This aspect is a bit like my suggestion in the linked post as
being something a bit like the Java spec, but without objects. Tao's
"Intent" VM is perhaps even closer. Not stack based. I would probably
still be happy if limited common-subexpression-elimination (factoring) was
allowed, to paper-over the array index vs pointer/cursor coding style vs
architecture differences.

b) Very little or no "language" level support for control structures or
calling conventions, but made-up-for with powerful compile-time
meta-programming facilities, and a standard "macro" library that provides
most of the expected facilities found in languages like C or Pascal. Much
of what are now thought of as compiler implementation features would wind
up in macro libraries. The advantage of this would be that code could be
written to *rely* on specific transformation performance and existence,
instead of just saying "hope that your compiler is clever enough to
recognize this idiom", in the documentation. It would also make possible
the sorts of small code factorizations that happen all the time in
assembly language, but which single-value-return, unnested function call
conventions in C make close to impossible. Or different coding styles,
like threaded interpreters, reasonable without language extensions.

I imagine something like LLVM (http://llvm.cs.uiuc.edu/), but with a
powerful symbolic compile-time macro language on top (eg scheme...), an
algepraic (infix) operator syntax, and an expression parser.

In the mean time, "C", not as defined by the standard, but as implemented
in the half dozen or so compilers that I regularly use, is not so far from
what I want, to make me put in the effort to build my universal assembler
myself.

Cheers,

Keith Thompson · Mar 14, 2006

Dik T. Winter said:
What is valid for C is also valid for Pascal. But not all is valid.
Code generation, for instance, is *not* part of the compiler. Without
a back-end that translates the code generated by the compiler to actual
machine instructions, you are still nowhere. So your simple easily written
initial compiler was not an easily written initial compiler. You have
to consider what you use as assembler as backend. In contrast the very
first Pascal compiler was really single-pass and generated machine code
on-the-fly, without backend. Everything ready to run. (BTW, on that
machine the linking stage was almost never pre-performed, it took only
a few milli-seconds.)

This is a matter of terminology, but I'd say that code generation
certainly is part of the compiler. If the compiler generates machine
code directly that's certainly true. If it generates assembly
language and invokes a separate assembler, I'd still say that the
assembler is logically part of the compiler; both the front-end and
the assembler are part of translation phase 7 as described in the C
standard.

Richard Bos · Mar 14, 2006

Sure, if "most machines" excludes load/store architectures, and
machines which cannot operate directly on an object of the size of
whatever x happens to be, and all the cases where "x" is a pointer to
an object of a size other than the machine's addressing granularity...

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar.

One can make a case that in at least one aspect, C is _less_ like
assembler than many other high-level languages: for. Most languages only
support for loops looking like FOR n=1 TO 13 STEP 2: NEXT or for n:=10
downto 1 do. Such loops are often easily caught in one, or a few, simple
machine instructions. Now try the same with for (node=root; node;
node=node->next) or for (i=1, j=10; x && y[j]; i++, j--).

Richard

Willem · Mar 14, 2006

Richard wrote:
) One can make a case that in at least one aspect, C is _less_ like
) assembler than many other high-level languages: for. Most languages only
) support for loops looking like FOR n=1 TO 13 STEP 2: NEXT or for n:=10
) downto 1 do. Such loops are often easily caught in one, or a few, simple
) machine instructions. Now try the same with for (node=root; node;
) node=node->next) or for (i=1, j=10; x && y[j]; i++, j--).

I disagree.

for (a; b; c) is trivially compiled to:

<compile a>
label-loop:
<compile b>
branch-if-false label-end
<compile c>
branch label-loop
label-end:

Which is a total of two branches and two branch labels.
So, two asm instructions comprise the whole for loop.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Richard Bos · Mar 14, 2006

Willem said:
Richard wrote:
) One can make a case that in at least one aspect, C is _less_ like
) assembler than many other high-level languages: for. Most languages only
) support for loops looking like FOR n=1 TO 13 STEP 2: NEXT or for n:=10
) downto 1 do. Such loops are often easily caught in one, or a few, simple
) machine instructions. Now try the same with for (node=root; node;
) node=node->next) or for (i=1, j=10; x && y[j]; i++, j--).

I disagree.

for (a; b; c) is trivially compiled to:

<compile a>
label-loop:
<compile b>
branch-if-false label-end
<compile c>
branch label-loop
label-end:

Which is a total of two branches and two branch labels.
So, two asm instructions comprise the whole for loop.

_If_ the computation of a, b and c are trivial. The point is that in C,
they can be as complex as you like. For the whole loop, you _could_ need

<200 machine instructions to initialise a>
label-loop:
<600 machine instructions to compute b>
branch-if-false label-end
[whatever is needed for the loop body]
<400 machine instructions to update c>
branch label-loop
label-end:

That's 1202 machine instructions for the loop itself. For the more
straightforward, count-only, Basic-and-Pascal kind, all you'll ever need
on many computers is:

store initial-value loop-register
label-loop:
[whatever is needed for the loop body]
decrease-and-jump-if-not-zero label-loop

That's 2.

Richard

Willem · Mar 14, 2006

)> for (a; b; c) is trivially compiled to:
)>
)> <compile a>
)> label-loop:
)> <compile b>
)> branch-if-false label-end
)> <compile c>
)> branch label-loop
)> label-end:
)>
)> Which is a total of two branches and two branch labels.
)> So, two asm instructions comprise the whole for loop.

Richard wrote:

) _If_ the computation of a, b and c are trivial. The point is that in C,
) they can be as complex as you like. For the whole loop, you _could_ need
)
) <200 machine instructions to initialise a>
) label-loop:
) <600 machine instructions to compute b>
) branch-if-false label-end
) [whatever is needed for the loop body]
) <400 machine instructions to update c>
) branch label-loop
) label-end:
)
) That's 1202 machine instructions for the loop itself. For the more

Those 1200 asm instructions are compiled from explicit code, written
out in full in the source code. The only thing the 'for' does,
is add two jumps in strategic places between those pieces of code.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

CBFalconer · Mar 14, 2006

Keith said:
.... snip ...

Interesting. I think you may have found a small flaw in the
standard.

I'm reasonably sure that it's *intended* that a one-past-the-end
pointer is unequal to NULL, but I don't think the standard quite
manages to say so.

I think there is something specific, but even if not, it is
certainly implied. For example, if p points to the last item in an
array, p++ is valid. p-- will then produce a valid dereferencable
pointer, while p = NULL; p-- will not.

Ed Prochak · Mar 14, 2006

I should have phrased this: C is LIKE an assembler.

Incorrect. Attempting to assign an integer value to a pointer object,
or vice versa, is a constraint violation, requiring a diagnostic.

a Warning.

Integer and pointers can be converted back and forth only using a cast
(an explicit conversion operator). The result of such a conversion is
implementation-defined.

Even if this were correct, it certainly wouldn't make C an assembler.

To some degree you are right. It's actually pointer manipulation that
makes it closer to assembler.

I don't know what "recent C standards" you're referring to. C
requires CHAR_BIT to be at least 8; it can be larger. short and int
must be at least 16 bits, and long must be at least 32 bits. A
conforming implementation, even with the most current standard, could
have 9-bit char, 18-bit short, 36-bit int, and 72-bit long.

How about a bitsliced machine that uses only 6bit integers?

But this is a common feature of many high-level languages. Ada, for
example has an implementation-defined set of integer types, similar to
what C provides; I've never heard anyone claim that Ada is an
assembler.

Forgive my memory,but is it PL/1 or ADA that lets the programmer define
what integer type he wants. Syntax was something like
INTEGER*12 X
defined X as a 12 bit integer. (Note that such syntax is portable in
that on two different processors, you still know that the range of X is
+2048 to -2047
The point is a 16bit integer in ADA is always a 16bit integer and
writing
x=32768 +10
will always overflow in ADA, but it is dependent on the compiler and
processor in C. It can overflow, or it can succeed.

But my point on this was, you need to know your target processor in C
more than in a language like ADA. This puts a burden on the C
programmer closer to an assembler programmer on the same machine than
to a ADA programmer.

You're talking about an implementation, not the language.

a big characteristic of assembler is that it is a simple language.
C is also a very simple language. Other HLLs are simple too, but the
simplicity combined with other characteristics suggest to me an
assembler feel to the language.

Again, you're talking about an implementation, not the language.

No I was talking about the original motivation for the design of the
language. It was designed to exploit the register increment on DEC
processors. in the right context, (e.g. y=x++

the increment doesn't
even become a separate instruction, as I mentioned in another post.

C doesn't allow access to specific registers (at least not portably).

But other HLL's don't even have register storage.

Here's what the C standard says about the "register" specifier:

A declaration of an identifier for an object with storage-class
specifier register suggests that access to the object be as fast
as possible. The extent to which such suggestions are effective is
implementation-defined.

I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.

And there are a few restrictions; for example, you can't take the
address of a register-qualified object.

Which makes sense to an assembler programmer, but not to a typical HLL
programmer.

You've given a few examples that purport to demonstrate that C is an
assembler.

Try giving a definition of the word "assembler". If the definition
applies to C (the language, not any particular implementation), I'd
say it's likely to be a poor definition, but I'm willing to be
surprised.

lets put it this way. there is a gradient scale, from pure digits of
machine language (e.g., programming obcodes in binary is closer to the
hardware than using octal or hex)
at the lowest end and moving up past assebmler to higher and higher
levels of abstraction away from the hardware. On that scale, I put C
much closer to assembler than any other HLL I know. here's some samples

PERL, BASH, SQL
C++, JAVA
PASCAL, FORTRAN, COBOL
C
assembler
HEX opcodes
binary opcodes
digital voltages in the real hardware.

Boy. you'd think I was insulting C based on the length of this thread.
8^)

But maybe this made my position clearer.
Ed.

Ed Prochak · Mar 14, 2006

Dik said:
Indeed. But even when we look at the published instructions C falls
short of providing a construct for every one. Where is the C construct
to do a multply step available in quite a few early RISC machines?
Note also that in assembler you can access the special bits indicating
overflow and whatever (if they are available on the machine). How to
do that in C?

Click to expand...

Well you cannot, but those processors did not even exist when C was
created. So those features didn't make it. To some degree, C is more of
a PDP assembler. But I wonder if there is a way to write it in C that
the compiler can recognize. You would only care IF you are targetting
such a specific RISC processor, in which case, your thinking shades
closer to the approach an assembler programmer takes than an HLL
programmer takes.

See the difference? It is not so much that C gives you absolute control
of the hardware, but approaching many programming tasks in C from the
view of an assembly programmer makes your code better. Then when the
need is more abstract, c still works for higher level programming.

I never said C was not an HLL.
Ed

great discussion BTW.

Ed Prochak · Mar 14, 2006

Dik said:
It does impose restrictions. You have to put in a cast.

not if you can live with a WARNING message.

The same is true for Pascal.

guess I'm getting forgetful in my old age. (haven't touched PASCAL in
over 10tears). I thought PASCAL defined fixed ranges for the datatypes
like integers. I guess I didn't port enough PASCAL applications to see
the difference. (I could have swore you'd get an error on X=32767+2 ;

What is valid for C is also valid for Pascal. But not all is valid.
Code generation, for instance, is *not* part of the compiler. Without
a back-end that translates the code generated by the compiler to actual
machine instructions, you are still nowhere. So your simple easily written
initial compiler was not an easily written initial compiler. You have
to consider what you use as assembler as backend. In contrast the very
first Pascal compiler was really single-pass and generated machine code
on-the-fly, without backend. Everything ready to run. (BTW, on that
machine the linking stage was almost never pre-performed, it took only
a few milli-seconds.)

Yes PASCAL and P-code, you have a point there, but I'm not sure it is
in your favor. Due to P-code, PASCAL is abstracted even above the
native assembler for the target platform. so we have
C->native assembler->program on native hardware
Pascal->program in p-code-> runs in p-code interpreter
So you have even less reason to think of the native hardware when
programming in PASCAL. This makes it more abstract and a higher HLL
than is C.

In C "register" is only a suggestion, it is not necessary to follow it.

The point is why even include this feature? It is because programming
you tend to think closer to the hardware than you do in PASCAL. Even
when I was doing some embedded graphics features for a product in
PASCAL, I don't think the CPU architecture ever entered my thoughts.

On the other hand, the very first Pascal compiler already did optimisation,
but not as a separte pass, but as part of the single pass it had. You
could tweek quite a bit with it when you had access to the source of the
compiler.

So the PASCAL compiler was more advanced than the C compiler of the
time. DO you think maybe it was due to PASCAL being a more abstract
HLL than C might have had an effect here? (more likely though, it was
PASCAL predated C, at least in widespread use.)

The difference is, IMHO, that PASCAL is a more abstract HLL, letting
the programmer think more about the application. While C is a HLL, but
with features that force/allow the programming to consider the
underlying processor. (in the context of this topic, "force" is the
word.)

Ed

Rod Pemberton · Mar 14, 2006

Ed Prochak said:
I should have phrased this: C is LIKE an assembler.

Or like this: C has low level features similar to an assembler.

How about a bitsliced machine that uses only 6bit integers?

I thought those died out. Were any those CPU's actually used in a computer
sufficiently advanced enough to compile C? As I recall, they were only used
as custom DSP's in the pre-DSP era, or as custom D/A convertors, etc...

Forgive my memory,but is it PL/1 or ADA that lets the programmer define
what integer type he wants. Syntax was something like
INTEGER*12 X

Probably ADA, I don't recall that in PL/1.

a big characteristic of assembler is that it is a simple language.
C is also a very simple language. Other HLLs are simple too, but the
simplicity combined with other characteristics suggest to me an
assembler feel to the language.

Again, C has low level features. I always use structured code when I
program in C. But, C allows coding in many unstructured ways (rumor: to
allow program porting of Fortran to C). But, it is a high level language.
I don't have to keep track of what data is in what register, or stack, or
memory, like when I coded in 6502 or when I code in IA-32. I don't need to
move data around to between registers, stack or memory, it's done for me in
C. I just need the name of the data or a named pointer to the data. I
don't need to setup prolog/epilog code. I don't need to calculate offsets
for branching instructions. etc...

No I was talking about the original motivation for the design of the
language. It was designed to exploit the register increment on DEC
processors. in the right context, (e.g. y=x++ the increment doesn't
even become a separate instruction, as I mentioned in another post.

Common C myth, but untrue:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html

"Thompson went a step further by inventing the ++ and -- operators, which
increment or decrement; their prefix or postfix position determines whether
the alteration occurs before or after noting the value of the operand. They
were not in the earliest versions of B, but appeared along the way. People
often guess that they were created to use the auto-increment and
auto-decrement address modes provided by the DEC PDP-11 on which C and Unix
first became popular. This is historically impossible, since there was no
PDP-11 when B was developed. The PDP-7, however, did have a few
`auto-increment' memory cells, with the property that an indirect memory
reference through them incremented the cell. This feature probably suggested
such operators to Thompson;"

But other HLL's don't even have register storage.

True. Many HLL's don't have pointers either which is a key attraction, for
me, to any language.

BTW, I've heard one of the Pascal standards added pointers...

I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.

Which makes sense to an assembler programmer, but not to a typical HLL
programmer.

True. This wouldn't or shouldn't make any sense to someone who doesn't
understand assembly.

lets put it this way. there is a gradient scale, from pure digits of
machine language (e.g., programming obcodes in binary is closer to the
hardware than using octal or hex)
at the lowest end and moving up past assebmler to higher and higher
levels of abstraction away from the hardware. On that scale, I put C
much closer to assembler than any other HLL I know. here's some samples

PERL, BASH, SQL
C++, JAVA
PASCAL, FORTRAN, COBOL
C
assembler
HEX opcodes
binary opcodes
digital voltages in the real hardware.

Based on my experiences, I'd list like so:

C, PL/1, FORTH
BASIC
PASCAL,FORTRAN
C (lowlevel), FORTH (lowlevel)
IA-32, 6502 assembler
HEX opcodes

My ranking of FORTRAN is highly debatable. It is strong in math, but
seriously primitive in a number of major programming areas, like string
processing. Yes, PASCAL is less useful that BASIC. BASIC had stronger, by
comparison, string processing abilities. Also, I don't see how you can
place Java above C, since it is a stripped down, pointer safe version of C.
PASCAL, (until) they added pointers, was basically a stripped down, pointer
safe version of PL/1.

Rod Pemberton

Ed Prochak · Mar 14, 2006

Richard said:
Sure, if "most machines" excludes load/store architectures, and
machines which cannot operate directly on an object of the size of
whatever x happens to be, and all the cases where "x" is a pointer to
an object of a size other than the machine's addressing granularity...

Click to expand...

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar.

Click to expand...

One can make a case that in at least one aspect, C is _less_ like
assembler than many other high-level languages: for. Most languages only
support for loops looking like FOR n=1 TO 13 STEP 2: NEXT or for n:=10
downto 1 do. Such loops are often easily caught in one, or a few, simple
machine instructions. Now try the same with for (node=root; node;
node=node->next) or for (i=1, j=10; x && y[j]; i++, j--).

Richard

True, but
for( <p1>; <p2>; p3> )
looks like a MACRO to me. It's up to the programmer to optimize
specific instances when you program in assembly. That's why there's
also while()

only difference is the loop block. In a macro assembler IOW you still
have to code that
goto top_of_for_loop2
at the end of the loop, either literally or in some endfor macro.

Ed

Al Balmer · Mar 14, 2006

a Warning.

The difference between "error" and "warning" is usually unimportant,
and for some compilers, seems arbitrary.

Violations should be fixed, no matter how nicely the compiler tells
you about them.

Keith Thompson · Mar 14, 2006

Ed Prochak said:
I should have phrased this: C is LIKE an assembler.

And a raven is like a writing desk.
<http://www.straightdope.com/classics/a5_266.html>

"C is an assembler" and "C is like an assembler" are two *very*
different statements. The latter is obviously true, given a
sufficiently loose interpretation of "like".

a Warning.

The C standard doesn't distinguish between different kinds of
diagnostics, and it doesn't require any program to be rejected by the
compiler (unless it has a "#error" directive). This allows for
language extensions; an implementation is free to interpret an
otherwise illegal construct as it likes, as long as it produces some
kind of diagnostic in conforming mode. It also doesn't require the
diagnostic to have any particular form, or to be clearly associated
with the point at which the error occurred. (Those are
quality-of-implementation issues.)

This looseness of requirements for diagnostics isn't a point of
similarity between C and assemblers; on the contrary, in every
assembler I've seen, misspelling the name of an opcode or using
incorrect punctuation for an addressing mode results in an immediate
error message and failure of the assembler.

To some degree you are right. It's actually pointer manipulation that
makes it closer to assembler.

C provides certain operations on certain types. Pointer arithmetic
happens to be something that can be done in most or all assemblers and
in C, but C places restrictions on pointer arithmetic that you won't
find in any assembler. For example, you can subtract one pointer from
another, but only if they're pointers to the same type; in a typical
assembler, pointer values don't even have types. Pointer arithmetic
is allowed only within the bounds of a single object (though
violations of this needn't be diagnosed; they cause undefined
behavior); pointer arithmetic in an assembler gives you whatever
result makes sense given the underlying address representation. C
says nothing about how pointers are represented, and arithmetic on
pointers is not defined in terms of ordinary integer arithmetic; in an
assembler, the representation of a pointer is exposed, and you'd
probably use the ordinary integer opertations to perform pointer
arithmetic.

How about a bitsliced machine that uses only 6bit integers?

What about it? A conforming C implementation on such a machine must
have CHAR_BIT>=8, INT_MAX>=32768, LONG_MAX>=2147483647, and so forth.
The compiler may have to do some extra work to implement this. (You
could certainly provide a non-conforming C implementation that
provides a 6-bit type called "int"; the C standard obviously places no
constraints on non-conforming implementations. I'd recommend calling
the resulting language something other than C, to avoid confusion.)

Forgive my memory,but is it PL/1 or ADA that lets the programmer define
what integer type he wants. Syntax was something like
INTEGER*12 X
defined X as a 12 bit integer. (Note that such syntax is portable in
that on two different processors, you still know that the range of X is
+2048 to -2047
The point is a 16bit integer in ADA is always a 16bit integer and
writing
x=32768 +10
will always overflow in ADA, but it is dependent on the compiler and
processor in C. It can overflow, or it can succeed.

I'm not familiar with PL/I.

Ada (not ADA) has a predefined type called Integer. It can have other
predefined integer types such as Short_Integer, Long_Integer,
Long_Long_Integer, and so forth. There are specific requirements on
the ranges of these types, quite similar to C's requirements for int,
short, long, etc. There's also a syntax for declaring a user-defined
type with a specified range:
type Integer_32 is range -2**31 .. 2**31-1;
This type will be implemented as one of the predefined integer types,
selected by the compiler to cover the requested range.

C99 has something similar, but not as elaborate: a set of typedefs in

But my point on this was, you need to know your target processor in C
more than in a language like ADA. This puts a burden on the C
programmer closer to an assembler programmer on the same machine than
to a ADA programmer.

You can get just as "close to the metal" in Ada as you can in C. Or,
in both languages, you can write portable code that will work properly
regardless of the underlying hardware, as long as there's a conforming
implementation. C is lower-level than Ada, so it's there's a greater
bias in C to relatively low-level constructs and system dependencies,
but it's only a matter of degree. In this sense, C and Ada are far
more similar to each other than either is to any assembler I've ever
seen.

[...]

a big characteristic of assembler is that it is a simple language.
C is also a very simple language. Other HLLs are simple too, but the
simplicity combined with other characteristics suggest to me an
assembler feel to the language.

If you're just saying there's an "assembler feel", I won't argue with
you -- except to say that, with the right mindset, you can write
portable code in C without thinking much about the underlying
hardware.

[...]

No I was talking about the original motivation for the design of the
language. It was designed to exploit the register increment on DEC
processors. in the right context, (e.g. y=x++ the increment doesn't
even become a separate instruction, as I mentioned in another post.

The PDP-11 has predecrement and postincrement modes; it doesn't have
preincrement or postdecrement. And yet C provides all 4 combinations,
with no implied preference for the ones that happen to be
implementable as PDP-11 addressing modes. In any case, C's ancestry
goes back to the PDP-7, and to earlier languages (B and BCPL) that
predate the PDP-11.

[...]

I know that it is just a suggestion. The point is Why was it included
in the language at all? Initially it gave the programmer more control.

Sure, but giving the programmer more control is hardly synonymous with
assembly language.

Which makes sense to an assembler programmer, but not to a typical HLL
programmer.

Sure, it's a low-level feature.

lets put it this way. there is a gradient scale, from pure digits of
machine language (e.g., programming obcodes in binary is closer to the
hardware than using octal or hex)
at the lowest end and moving up past assebmler to higher and higher
levels of abstraction away from the hardware. On that scale, I put C
much closer to assembler than any other HLL I know. here's some samples

PERL, BASH, SQL
C++, JAVA
PASCAL, FORTRAN, COBOL
C
assembler
HEX opcodes
binary opcodes
digital voltages in the real hardware.

That seems like a reasonable scale (I might put Forth somewhere below
C). But you don't indicate the relative distances between the levels.
C is certainly closer to assembler than Pascal is, but I'd say that C
and Pascal are much closer to each other than either is to assembler.

You can write system-specific non-portable code in any language. In
assembler, you can *only* write system-specific non-portable code. In
C and everything above it, it's possible to write portable code that
will behave as specified on any system with a conforming
implementation, and a conforming implementation is possible on a very
wide variety of hardware. Based on that distinction, there's a
sizable gap between assembler and C.

I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
The Horror of pointers...	4	Jan 11, 2025
C pipe	1	Dec 9, 2021
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
I need help making a zooming function	11	Dec 14, 2021
I need help making an html website	2	Aug 2, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Fibonacci	0	May 13, 2023

Making Fatal Hidden Assumptions

Keith Thompson

Keith Thompson

Andrew Reilly

Dik T. Winter

Dik T. Winter

Dik T. Winter

Andrew Reilly

Keith Thompson

Richard Bos

Willem

Richard Bos

Willem

CBFalconer

Ed Prochak

Ed Prochak

Ed Prochak

Rod Pemberton

Ed Prochak

Al Balmer

Keith Thompson

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads