Making Fatal Hidden Assumptions

Paul Keinanen · Mar 10, 2006

Andrew Reilly wrote:

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

I don't see much room for a "universal" assembler between C and a
traditional assembler, since the instruction sets can vary quite a
lot.

However, there exists a group of intermediate languages that try to
solve some problems in traditional assembler. One problem is the
management of local labels for branch targets, the other is that with
usually one opcode/source line the program can be quite long. Both
these things make it get a general view of what is going on, since
only a small fraction of the operations will fit into the editor
screen.

To handle the label management, some kind of conditional and repeat
blocks can be easily created. Allowing simple assignment statements
with assembly language operands makes it easier to but multiple
instructions on a single line.

Such languages have been implemented either as a preprocessor to an
ordinary assembler or directly as a macro set at least on PDP-11 and
VAX. The source code for PDP-11 might look like this:

IF R5 EQ #7
R3 = Base(R2) + (R0)+ - R1 + #4
ELSE
R3 = #5
SETF ; "Special" machine instruction
END_IF

would generate

CMP R5,#7
BNE 9$ ; Branch on opposite condition to else part
MOV Base(R2),R3
ADD (R0)+,R3
SUB R1,R3
ADD #4,R3
BR 19$ ; Jump over else part
9$: ; Else part
MOV #5,R3
SETF ; Copied directly from source
19$:

The assignment statement was always evaluated left to right or no
parenthesis were available to change the evaluation order. The
conditional expression consisted of one or two addressing mode
expression and a relational operator that translated to a conditional
branch instruction.

There is no point of trying to invent new constructions for each
machine instructions, so there is no problems in inserting any
"Special" machine instructions in the source file (in this case SETF),
which is copied directly to the pure assembler file.

For a different processor, the operands would of course be different,
but the control structures would nearly identical.

Paul

Ed Prochak · Mar 10, 2006

Keith said:
Ed Prochak said:

Andrew Reilly wrote: [...]

Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Click to expand...

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

Click to expand...

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions.

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

Most assembly languages are, of
course, machine-specific, since they directly specify the actual
instructions.

x++; in most machines can map to a single instruction
which is different from the instruction for ++x;

One could imagine a more generic assembler that uses
some kind of pseudo-instructions that can be translated more or less
one-to-one to actual machine instructions. C, though it's closer to
the machine than some languages, is not an assembler in this sense; in
a C program, you specify what you want the machine to do, not what
instructions it should use to do it.

I would agree that if an assembler must be a one-to-one mapping from
source line to opcode, then C doesn't fit. I just don't agree with that
definition of assembler.

<OT>Forth might be an interesting data point in this discussion, but
if you're going to go into that, please drop comp.lang.c from the
newsgroups.</OT>

Forth is definitely a contender.

nice discussion.
ed

Richard G. Riley · Mar 10, 2006

"Ed"posted the following on 2006-03-10:

Keith said:
Keith said:

Ed Prochak said:

Andrew Reilly wrote: [...]
Yeah, me to. Still do, regularly, on processors that will never have a C
compiler. C is as close to a universal assembler as we've got at the
moment. It doesn't stick it's neck out too far, although a more
deliberately designed universal assembler would be a really good thing.
(It's on my list of things to do...)

Would a better universal assembler be more like assembler or more like
high level languages? I really think C hit very close to the optimal
balance.

Click to expand...

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions.

Click to expand...

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

But a mnemonic representing an instruction is still just that. The
macro part is nothing more than rolling up of things for brevity. A
subsequent disassembly will reveal all the hidden gore.

x++; in most machines can map to a single instruction
which is different from the instruction for ++x;

How do you see them being different at the assembler level? They are
not are they? Its just when you do the (pseudo ASM) INC_REGx or ADDL 02,REGx or
whatever that matters isnt it?

e.g if we have

y=++x;

then the pseduo assembler is
INC x
move y,x

where as

y=x++

is
move y,x
INC x

Not taking into account expression return value that are CPU
equivalent. Admittedly I havent dabbled in later instruction sets for
the new post 80386 CPUs so please slap me down or better explain the
above if not right : its interesting.

I would agree that if an assembler must be a one-to-one mapping from
source line to opcode, then C doesn't fit. I just don't agree with that
definition of assembler.

Sorry for coming late, but how do you see an assembler? In common
parlance it has always been (in my world) a program for converting
instruction set mnemonics into equivalent opcodes which run natively
on the target CPU.

Al Balmer · Mar 10, 2006

Keith said:
Keith said:

Ed Prochak said:

Andrew Reilly wrote: [...]
Yeah, me to. Still do, regularly, on processors that will never have a C

Click to expand...

Click to expand...

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

Not really. Many assemblers have predefined macros for various things,
and C programmers write macros using preprocessor directives.

x++; in most machines can map to a single instruction
which is different from the instruction for ++x;

Huh? As a standalone statement, if x is an integer type, I'd expect
both to be mapped to the machine's equivalent of INC x. If it's
embedded in a larger statement, or it's a pointer, it's likely that
several instructions will be generated, and a compiler (including a C
compiler) will do things that a macro assembler won't do.
Proposed decades ago, and there has been some implementation.

C, though it's closer to

I would agree that if an assembler must be a one-to-one mapping from
source line to opcode, then C doesn't fit. I just don't agree with that
definition of assembler.

Nor does anyone else, since the invention of macros. However, C
doesn't fit any widely accepted definition of assembler. You can have
your own definition of assembler, as long as you don't expect folks to
know what you're talking about.

Ed Prochak · Mar 10, 2006

Richard said:
"Ed"posted the following on 2006-03-10:
[]

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

Click to expand...

But a mnemonic representing an instruction is still just that. The
macro part is nothing more than rolling up of things for brevity. A
subsequent disassembly will reveal all the hidden gore.

But it will NOT display the original macro. There is no 1 to 1 mapping
from source to code.

How do you see them being different at the assembler level? They are
not are they? Its just when you do the (pseudo ASM) INC_REGx or ADDL 02,REGx or
whatever that matters isnt it?

Actuall I still think in PDP assembler at times (my first
assemblerprogramming).
so y=x++; really does map to a single instruction which both moves the
value to y and increments x (which had to be held in a register IIRC)

e.g if we have

y=++x;

then the pseduo assembler is
INC x
move y,x

MOV R1,x
A: MOV y,R1++
## I may have the syntax wrong, it's been a LONG time

where as

y=x++

is
move y,x
INC x

MOV R1,x
B: MOV y,++R1

The opcodes for those two instructions (lines A and B) are different in
PDP assembler.

Not taking into account expression return value that are CPU
equivalent. Admittedly I havent dabbled in later instruction sets for
the new post 80386 CPUs so please slap me down or better explain the
above if not right : its interesting.

I haven't played much in the intel realm since about the 286, and I
haven't done much assembly at all for about 10years. Even the last
embedded project I worked on with a tiny 8bit micro had a C compiler,
so I did nearly nothing in assembler. C makes it so much easier. I've
had the opinion of C as assembler since I first learned it (about
1983).

Some other languages do so much more for you that you might be scared
to look at the disassembly. e.g. languages that do array bounds
checking for you will generate much more code for a[y]=x; than does C.
You can picture the assembly code for C in your head without much
difficulty. The same doesn't hold true for some other languages.

Sorry for coming late, but how do you see an assembler? In common
parlance it has always been (in my world) a program for converting
instruction set mnemonics into equivalent opcodes which run natively
on the target CPU.

I told you, a macro assembler does not work that way. One macro might
expand not just to multiple mnemonics, but to different mnemonics
depending on parameters. It is not 1 to 1 from source to assembly
mnemonics (let alone opcodes). A macro assembler can abstract just a
little or quite a lot away from the target machine. Depends on how you
use it. So while , an assembler is

[] a program for converting
instruction set mnemonics into equivalent opcodes which run natively
on the target CPU.

there's nothing about that conversion being one-to-one (mnemonic to
opcode)

even without macros, the one-to-one doesn't work if in the instruction
set the opcode for moving registers differs from moving memory, so
MOV R2,R!
differs from
MOV B,A
where R1 and R2 are register identifiers and A and B are memory
location. Yet we talk about the MOVe mnemonic as if both were the same
operation.

C's assignment operator maps about as closely to those opcodes as that
MOV mneumonic does. That's why I say it's a glorified assembler. You
have about as good an idea of what code is generated as you do with a
good assembler (as long as we can ignore the compiler's obtimizer).

Nice quote.
ed

cs_posting · Mar 10, 2006

Al said:
It doesn't have to make sure. It's free to segfault. You write funny
code, you pay the penalty (or your customers do.) Modern hardware does
a lot of speculation. It can preload or even precompute both branches
of a conditional, for example.

Hmm, so if I'm decrementing a divisior, and branching off somewhere
else before the actual divide instruction if the would be divisor is
zero, and your precomputation of both branches traps a division by zero
that a literal execution of my program would never perform... whose
fault is that?

I suspect that exception handling in speculative execution is a problem
that has been looked into.

Michael Wojcik · Mar 10, 2006

x++; in most machines can map to a single instruction

Sure, if "most machines" excludes load/store architectures, and
machines which cannot operate directly on an object of the size of
whatever x happens to be, and all the cases where "x" is a pointer to
an object of a size other than the machine's addressing granularity...

I suppose you could argue that "can" in your claim is intended to be
weak - that, for "most machines" (with a conforming C implementation,
presumably), there exists at least one C program containing the
statement "x++;", and a conforming C implementation which will
translate that statement to a single machine instruction.

But that's a very small claim. All machines "can" map that statement
to multiple instructions as well; many "can" map it to zero
instructions in that sense (taking advantage of auto-increment modes
or the like). What can happen says very little about what will.

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar. (Macros aren't a counterexample
because they're purely lexical constructs; in principle they're
completely separate from code generation.)

C isn't assembler because:

- It doesn't impose a strict mapping between (preprocessed) source
and generated code. The "as if" clause allows the implementation
to have the generated code differ significantly from a strict
interpretation of the source acting on the virtual machine.

- It has generalized constructs (expressions) which can result in
the implementation generating arbitrarily complex code.

Al Balmer · Mar 10, 2006

Hmm, so if I'm decrementing a divisior, and branching off somewhere
else before the actual divide instruction if the would be divisor is
zero, and your precomputation of both branches traps a division by zero
that a literal execution of my program would never perform... whose
fault is that?

Not something to worry about, though you'd have to ask an expert why

I suspect that this stuff is below the level of exception
triggers.

Paul Keinanen · Mar 10, 2006

The presence in C of syntactic sugar for certain simple operations
like "x++" doesn't support the claim that C is somehow akin to
assembler in any case. One distinguishing feature of assembler is
a *lack* of syntactic sugar. (Macros aren't a counterexample
because they're purely lexical constructs; in principle they're
completely separate from code generation.)

Mnemonics and symbolic addresses in assemblers are just syntactic
sugar built on the binary machine code

.

Entering machine codes in hex or octal is also syntactic sugar.

Paul

Walter Roberson · Mar 10, 2006

Hmm, so if I'm decrementing a divisior, and branching off somewhere
else before the actual divide instruction if the would be divisor is
zero, and your precomputation of both branches traps a division by zero
that a literal execution of my program would never perform... whose
fault is that?

I suspect that exception handling in speculative execution is a problem
that has been looked into.

[Getting off-topic for comp.lang.c...]

Yes. For example on the MIPS architecture, an exception state is
inserted into the flow, but the exception itself is not taken
unless the exception "graduates"; the exception is supressed if
the conditional results turn out to be such that it was not needed.

In the MIPS IV instruction set, divide can be done as
"multiply by the reciprical", and it is not uncommon to schedule
the reciprical operation ahead of time, before the code has had
time to check whether the denominator is 0. The non-zeroness
is speculated so as to get a "head start" on the time-consuming
division operation.

If I recall correctly, a fair bit of the multi-instruction pipelining
on MIPS is taken up with controls to handle speculation properly.

Keith Thompson · Mar 10, 2006

Ed Prochak said:
Keith Thompson wrote: [...]

Whether C is a "universal assembler" is an entirely separate question
from whether C is "good", or "better" than something else, or close to
some optimal balance.

As I understand the term, an assembly language is a symbolic language
in which the elements of the language map one-to-one (or nearly so)
onto machine-level instructions.

Click to expand...

The one-to-one mapping is broken for a macro assembler. Often there are
macros defined for subroutine entry and exit, loops and other things.
the only difference is that everyone defines their own macros, while in
C everybody uses the same "macros".

There's a continuum from raw machine language to very high-level
languages. Macro assembler is only a very small step up from
non-macro assembler. C is a *much* bigger step up from that. Some C
constructs may happen to map to single instructions for *some*
compiler/CPU combinations; they might map to multiple instructions, or
even none, for others. An assignment statement might copy a single
scalar value (integer, floating-point, or pointer) -- or it might copy
an entire structure; the C code looks the same, but the machine code
is radically different.

Using entirely arbitrary units of high-level-ness, I'd call machine
language close to 0, assembly language 10, macro assembler 15, and C
about 50. It might be useful to have something around 35 or so.
(This is, of course, mostly meaningless.)

Assembly language is usually untyped; types are specified by which
instruction you use, not by the types of the operands. C, by
contrast, associates types with variables. It often figures out how
to implement an operation based on the types of its operands, and many
operations are disallowed (assigning a floating-point value to a
pointer, for example).

I know the old joke that C combines the power of assembly language
with the flexibility of assembly language. I even think it's funny.
But it's not realistic, at least for C programmers who care about
writing good portable code.

cs_posting · Mar 11, 2006

Randy said:
This is a lot of whining about a specific problem that can
easily be remedied just by changing the loop construction. The
whole debate is pretty pointless in that context, unless you
have some religious reason to insist upon the method in the
original.

Have to remember thought that the C program in question is really a
back translation of approximating an assembly language original. If
the compiler builds the undefined pointer operation in the logical way,
it will be essentially the same as the hand written assembly language
code.

To then claim that speculative execution may cause an exception on the
result is to imply that the assembly language author, who has a pretty
good idea what assumptions he is making, must now add "speculative
loading of something I wasn't going to fetch" to the list of concerns.

Or were you thinking it was the compiler rather than processor logic
which was going to do the speculating?

Some pipelining tricks like the MIPS branch delay slot, are explicitly
part of the programming model, and you do have to manually handle them
when working with low level assembly code. But for the x86,
speculation is not...

Al Balmer · Mar 11, 2006

Have to remember thought that the C program in question is really a
back translation of approximating an assembly language original

Why? We've long since stopped discussing that program.

Mark L Pappin · Mar 11, 2006

Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack.
Funnily enough, these are characteristics possessed by chips for which
I compile C code every day.)

mlp

Albert van der Horst · Mar 11, 2006

Ben Bacarisse said:
I found Eric Sosman's "if (buffer + space_required > buffer_end) ..."
example more convincing, because I have seen that in programs that are
intended to be portable -- I am pretty sure I have written such things
myself in my younger days. Have you other more general examples of
dangerous assumptions that can sneak into code? A list of the "top 10
things you might be assuming" would be very interesting.

I find it not an example of implicit assumptions, just of bad coding.

There is no reason not to use the almost as readable, problemless

if ( buffer_end - buffer < space_required )

(It mentally reads as " if the number of elements that still
fit in the buffer is less then the amount of elements we require")

Ben.

Groetjes Albert

Jordan Abel · Mar 11, 2006

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack.
Funnily enough, these are characteristics possessed by chips for which
I compile C code every day.)

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and, of
course, a decent-sized library]

CBFalconer · Mar 11, 2006

Jordan said:
It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of
storage, Harvard architecture, and lack of programmer-accessible
stack. Funnily enough, these are characteristics possessed by
chips for which I compile C code every day.)

Click to expand...

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and,
of course, a decent-sized library]

These machines may well have C compilers, just not conforming
ones. The areas of non-conformance are likely to be:

object size available
floating point arithmetic
recursion depth (1 meaning no recursion)
availability of long and long-long.
availability of standard library

Once more, many programs can be written that are valid and portable
C, without requiring these abilities. The thing that needs to be
documented is use of possible non-standard substitutes for standard
features.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

cs_posting · Mar 12, 2006

There is no reason not to use the almost as readable, problemless

if ( buffer_end - buffer < space_required )

If the computation in one version can be reduced to a constant by the
compiler, that would be a reason for using that version.

I can imainge a number of situations in which "bad coding" is the
result of a programmer with a mental idea of how to accomplish
something efficiently, trying to render that approach in C as if it
were assembly language. This is doubly likely on small systems...

The problem of course is that the compiler has it's own ideas about how
to be efficient.

And the standards committee may have very different ideas from the
would-be hand optimizing programmer about how you are supposed to
instruct the compiler in what you want!

Jordan Abel · Mar 12, 2006

Jordan said:
Jordan said:

I spent 25 years writing assembler.

Yeah, me to. Still do, regularly, on processors that will never
have a C compiler.

It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of
storage, Harvard architecture, and lack of programmer-accessible
stack. Funnily enough, these are characteristics possessed by
chips for which I compile C code every day.)

Click to expand...

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and,
of course, a decent-sized library]

Click to expand...

These machines may well have C compilers, just not conforming
ones. The areas of non-conformance are likely to be:

object size available
floating point arithmetic
recursion depth (1 meaning no recursion)

availability of long and long-long.

Or even the range of int itself. [People have claimed that "c"
implementations exist with 8-bit int]

availability of standard library

An implementation can conform, as a freestanding implementation, with
VERY little of the standard library

I'd question how much of the other stuff can be gone and still
considered "c", though.

Mark L Pappin · Mar 12, 2006

Jordan Abel said:
It's a little OT in c.l.c, but would you mind telling us just what
processors those are, that you can make such a guarantee? What
characteristics do they have that means they'll never have a C
compiler?

(A few I can recall having been proposed are: tiny amounts of storage,
Harvard architecture, and lack of programmer-accessible stack.
Funnily enough, these are characteristics possessed by chips for which
I compile C code every day.)

Click to expand...

"tiny amounts of storage" may preclude a conforming hosted
implementation [which must support an object of 65535 bytes, and, of
course, a decent-sized library]

Got me there. Freestanding only, with maybe 16 bytes of RAM and 256
words of ROM - no 'malloc()' here, and 'printf()' can be problematic.

A freestanding C implementation does, however, still have a C
compiler. I'm curious which processors Andrew Reilly claims "will
never have a C compiler", and why he makes that claim.

mlp

I Need Help with making a function that draws in a canvas using location data.	1	Dec 17, 2021
The Horror of pointers...	4	Jan 11, 2025
C pipe	1	Dec 9, 2021
Fatal error: Uncaught Error: Cannot use object of type WP_Error as array in	0	Dec 23, 2021
I need help making a zooming function	11	Dec 14, 2021
I need help making an html website	2	Aug 2, 2023
A process take input from /proc/<pid>/fd/0, but won't process it	0	Oct 29, 2023
Fibonacci	0	May 13, 2023

Making Fatal Hidden Assumptions

Paul Keinanen

Ed Prochak

Richard G. Riley

Al Balmer

Ed Prochak

cs_posting

Michael Wojcik

Al Balmer

Paul Keinanen

Walter Roberson

Keith Thompson

cs_posting

Al Balmer

Mark L Pappin

Albert van der Horst

Jordan Abel

CBFalconer

cs_posting

Jordan Abel

Mark L Pappin

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads