S
Steven D'Aprano
It's like this: there *should* be one-to-one mappings between the
various high-level constructs to the machine code, varying only between
different chips (that is the purpose of the compiler after all), yet for
some operations, in languages like scheme, well... I cannot say what
happens... hence my challenge.
Only that you've got a consistent, stable (and therefore, formalizable)
translation from your language to the machine.
You are mistaken to think that there is a single, one-to-one, mapping
between high-level code and machine code.
In practice, only if you use the exact same source code, the exact same
compiler, the exact same version of that compiler, with the exact same
compiler options, and the exact same version of the language and all its
libraries, then and only then will you will get the same machine code.
And even that is not guaranteed.
Take, for example, the single high-level operation:
sort(alist)
What machine code will be executed? Obviously that will depend on the
sort algorithm used. There are *dozens*. Here are just a few:
http://en.wikipedia.org/wiki/Category:Sorting_algorithms
Now sorting is pretty high level, but the same principle applies to even
simple operations like "multiply two numbers". There are often multiple
algorithms for performing the operation, and even a single algorithm can
often be implemented in slightly different ways. Expecting all compilers
to generate the same machine code is simply naive.
We can use Python to discuss this, since Python includes a compiler. It
generates byte code, which just means machine code for a virtual machine
instead of a hardware machine, but the principle is the same. Python even
has a disassembler, for taking those raw byte codes and turning them into
human-readable pseudo-assembly for the Python Virtual Machine.
Here is some "machine code" corresponding to the high-level instructions:
x = 23
y = 42
z = x + y
del x, y
py> code = compile('x = 23; y = 42; z = x + y; del x, y', '', 'exec')
py> code.co_code
'd\x00\x00Z\x00\x00d\x01\x00Z\x01\x00e\x00\x00e\x01\x00\x17Z\x02\x00[\x00
\x00[\x01\x00d\x02\x00S'
Translated into "assembly":
py> from dis import dis
py> dis(code)
1 0 LOAD_CONST 0 (23)
3 STORE_NAME 0 (x)
6 LOAD_CONST 1 (42)
9 STORE_NAME 1 (y)
12 LOAD_NAME 0 (x)
15 LOAD_NAME 1 (y)
18 BINARY_ADD
19 STORE_NAME 2 (z)
22 DELETE_NAME 0 (x)
25 DELETE_NAME 1 (y)
28 LOAD_CONST 2 (None)
31 RETURN_VALUE
You should be able to see that there are all sorts of changes that the
compiler could have made, which would have lead to different "machine
code" but with the exact same behaviour. This particular compiler emits
code for x=23; y=42 in the order that they were given, but that's not
actually required in this case. The compiler might have reversed the
order, generating something like:
0 LOAD_CONST 1 (42)
3 STORE_NAME 1 (y)
6 LOAD_CONST 0 (23)
9 STORE_NAME 0 (x)
or even:
0 LOAD_CONST 1 (42)
3 LOAD_CONST 0 (23)
6 STORE_NAME 1 (y)
9 STORE_NAME 0 (x)
without changing the behaviour of the code. Or it might have even
optimized the entire thing to:
0 LOAD_CONST 0 (65)
3 STORE_NAME 0 (z)
since x and y are temporary variables and a smart compiler could perform
the computation at compile time and throw them away. (Nitpicks about
"what if x and y already existed?" aside.) CPython even does this sort of
optimization, although in a more limited fashion:
py> dis(compile('x = 23 + 42', '', 'exec'))
1 0 LOAD_CONST 3 (65)
3 STORE_NAME 0 (x)
6 LOAD_CONST 2 (None)
9 RETURN_VALUE
This is called keyhole optimization. It's not beyond possibility for a
smarter compiler to look beyond the keyhole and make optimizations based
on the whole program.
So you can see that there is no one-to-one correspondence from high-level
source code to low-level machine code, even for a single machine. The
same is even more true for languages such as C, Fortran, Pascal, Lisp,
Scheme, Haskell, Java, etc. where people have spent years or decades
working on compiler technology. Compilers differ in the quality and
efficiency of the machine code they emit and the choices they make about
translating source code to machine code.
That's all. Everything
else is magic. Do you know that the Warren Abstraction Engine used to
power the predicate logic in Prolog into machien code for a VonNeumann
machine is so complex, no one has understood it
That's nonsense. In your next post, you even supply a link to a book
describing how the WAM works with detailed step-by-step instructions for
writing your own. For those reading, here it is:
www.cvc.uab.es/shared/teach/a25002/wambook.pdf
Prolog is not some dark black magic that nobody understands. There are
multiple implementations of Prolog-like logic languages for Python:
http://pyke.sourceforge.net/
https://github.com/logpy/logpy
https://github.com/enriquepablo/nl/
http://code.google.com/p/fuxi/
https://sites.google.com/site/pydatalog/
to say nothing of similar, more advanced languages like Mercury. Just
because *you personally* don't understand something, don't think that
nobody else does.