All quite right, indeed. As you have, quite correctly, written, I also
think there's an important difference between people who use C because
"it's a portable assembler" and people who use C because it makes many
things "portable across different machines".
I think the first group of people do not care at all and shouldn't
really be using the term 'portable' to describe C. WHat they mean by
'portable' is something closer to 'more human readable than assembly' or
'a nice procedural layer over machine code'.
The second group of people are those who care about struct member
alignment, the nuances that make intptr_t different from a cast of
(void *) to uint64_t and so on.
These two uses of 'portable' are _not_ interchangeable, but seeing
statements like "C is portable assembler" tends to confuse the two.
a person can basically use it somewhere in between:
C is a "portable assembler" in that it can generally do a lot of the
stuff ASM would normally do, but is high-level enough to not need to be
rewritten for each target machine (but can be used over a range of
machines).
though, there may still be things which C can't do, which may require
occasional use of real ASM code.
one thing that is often a hindrance to using C as a good target language
for compilers is, ironically, its fairly strict adherence to top-level
block-structuring.
some C extensions, such as computed goto, can help to a limited extent
here, but given computed goto can't cross function boundaries (or, by
extension, boundaries between one compilation unit and another), a limit
is placed on its use here (IOW: it doesn't scale very well).
the alternative option is mostly to break things down into smaller units
and emit each as its own function, making heavy use of function-calls
and function pointers instead, but this usually comes at some
performance cost (an HLL function will often be split into a number of C
functions and often involve some use of a "trampoline loop" to call
between them).
note that code compiled using this strategy will often not be
significantly faster IME than using a similar strategy to interpret
threaded code, since in both cases there will be notable slowdown due to
function calls/returns and the use of trampoline loops. code in both
cases will often be several times, or more, slower than "direct" C code
(code lacking excessive function calls or trampoline loops).
(in my own tests, I have usually seen a slowdown of at least 5x-8x from
doing so, typically with most of the running time the "trace dispatch
loop", or, essentially, a "trampoline loop").
it seems to be due mostly to the use of unpredictable indirect calls,
which the CPU seems to really dislike.
an alternative compromise could be though if (as a C extension) it were
possible to fetch and call label-pointers as if they were function
pointers (using the same signature as that of the parent function), and
also potentially refer to them from outside the function in question
(more like a struct), but currently I am not aware of any compiler which
supports this.
foo.label1(3, 4); //call into foo() via an arbitrary label.
fcn=&foo.label1; //get label pointer as function pointer.
possibly with something inside foo like:
"extern label1:"
with extern giving a compiler hint that maybe it should generate any
glue needed to make this callable.
this would not necessarily make calling them any faster (since they
would still need to set up a call-frame / ...), but could make the
general use of label-pointers and computed goto less limited.
though, yes, the big cost is (even if supported) it would still leave
the high-level compiler tied to a specific low-level compiler.
effectively, a lot of this still tends to leave ASM as a "better" target
for "reasonable performance" compiler output (since ASM doesn't really
care how anything is structured).
it can also allow for faster interpreters, although granted,
unpredictable indirect jumps are still a performance killer (in either C
or ASM).
however, making the interpreter work by spitting out a glob of native
code something like (or its CPU/ABI specific analogue):
push ebp
mov ebp, esp
....
mov eax, [...]
mov [esp], eax
call VM_OpA
mov [esp], eax
call VM_OpB
mov [esp], eax
call VM_OpC
mov [esp], eax
call VM_OpD
mov [esp], eax
....
mov esp, ebp
pop ebp
ret
can at least make things a fair bit faster...
or such...