I have no problem with your explanation. It's nearly impossible to
program in machine code, which is all 1's and 0's.
Not really; it's "voltage" or "no voltage" at different signal lines
in the processor. The dual system is just one representation you
could choose. More common (and practical) are hexadecimal or octal.
the difference is that machine code can be read directly, whereas
assembler has to be compiled in order to convert the opcodes to
binary data.
As I said before, IMHO this "compilation" if trivial compared to HLL
compilation, since it's just a translation from opcodes to numbers
and labels to addresses, respectively.
HLL compilers do much more; they translate high-level control
structures to low-level implementation (which is ambiguous). Often,
optimisation is employed, which may e. g. cause that a loop is
unrolled (vanishes in assembly).
I agree that the code segments, and the data, are all that's
meaningful to the processor. There are a few others, like
interrupts that affect the processor directly.
Interrupts and segments are orthogonal, don't you think?
I understand what you're saying but I'm refering to an executable
file ready to be loaded into memory.
Obviously not, since I was referring to such a file, too. Try
reading about "real" executable formats like ELF.
It's stored on disk in a series of 1's and 0's.
No, it's stored using a complex chain of magnetic fields. You _can_
interpret it as dual numbers, yes. But it's impractical and the
choice is up to the viewer.
The actual file on disk is in a certain format that only the
operating system understands. But once the code is read in, it
goes into memory locations which hold individual arrays of bits.
I agree. (Before, you wrote differently:
Both Linux and Windows compile down to binary files, which are
essentially 1's and 0's arranged in codes that are meaningful to
the processor.
E. g. the ELF header and data segments mean nothing of sense to the
processor itself.)
That's a machine code, since starting at 00000000 to 11111111, you
have 256 different codes available.
I'm afraid it's not that simple. IA-32 opcodes, for example, are
complex bit sequences and don't always have the same length.
Primary opcodes consist of up to three bytes in this architecture.
With some RISC CPUs, there is a machine instruction length
limitation of e. g. one word. But the IA-32 doesn't have this
limitation.
But you _do_ know that pyc files are Python byte code, and you
could only directly disassemble them to Python byte code
directly?
that's the part I did not understand, so thanks for pointing that
out. What I disassembled did not make sense. I was looking for
assembler code, but I do understand a little bit about how the
interpreter reads them.
For example, from os.py, here's part of the script:
# Note: more names are added to __all__ later.
__all__ = ["altsep", "curdir", "pardir", "sep", "pathsep",
"linesep",
"defpath", "name", "path", "devnull"]
here's the disassembly from os.pyc:
.... which is completely pointless because this is no IA-32 code
segment which the processor could execute, but a custom data file
format. I'd rather try this, for example:
.... i += 1
.... return argument
.... 2 0 LOAD_FAST 0 (i)
3 LOAD_CONST 1 (1)
6 INPLACE_ADD
7 STORE_FAST 0 (i)
3 10 LOAD_GLOBAL 0 (argument)
13 RETURN_VALUE
The Python VM, though, is stack-based, not register-based as most
CPUs. That's why the opcodes are quite different.
The script is essentially gone. I'd like to know how to read the
pyc files, but that's getting away from my point that there is a
link between python scripts and assembler. At this point, I admit
the code above is NOT assembler, but sooner or later it will be
converted to machine code by the interpreter and the OS and that
can be disassembled as assembler.
Yes. But the interpreter doesn't convert the entire file to machine
language. It reads one instruction after another and, amongst other
things, outputs corresponding machine code which "does" what's
intended by the byte code instruction.
Python needs an OS like Windows or Linux to interface it to the
processor.
Not really. The CPython executable contains machine code directly
executable by the host processor. The OS just
* provides routines for accessing peripherals and allocating memory,
* makes it possible that multiple programs can run side by side,
* and loads the executable and sets it up in memory for execution.
Yes, the source is readable like that, but the compiled binary is
not.
For a machine, it is. The translation is 1:1, trivial.
A disaasembly shows both the source and the opcodes.
The output I posted was directly from the GNU C compiler (compiled
from an empty "main" function). I got it by using a parameter that
tells the compiler to leave out the last step of generating machine
code from assembly, and save the source.
A "disassembly" is the other way round. The hexadecimal
representation of the source in the leftmost columns is completely
redundant and practically irrelevant for a human being.
The second column are opcodes,
Not only. It's machine code instructions, i. e. opcodes and
operands.
and the third column are mneumonics, English words attached to the
codes to give them meaning.
They're mn_e_monics, and they're not really english (what kind of
english words would RET, JLE or CMP be?).
The second and third column mean the same thing.
Not at all! They're the operands and can be memory addresses,
registers or fixed values.
A single opcode instruction like 59 = pop ecx and 48 = dec eax,
are self-explanatory.
It's a machine instruction which consists of the opcode POP and the
operand ECX.
The second instruction, call 1D00122A is not as straight forward.
it is made up of two parts: E8 = the opcode for CALL and the rest
'D1 FF FF FF' is the opcode operator
I'm afraid not -- it's the operand.
I would agree with what you said earlier, that there is a
similarity between machine code and assembler.
Is there, actually?
You can actually write in machine code, but it is often entered in
hexadecimal, requiring a hex to binary interpreter.
IMHO, this makes no sense. For example, the memory contents
represented by binary 1000 and 0x10 are exactly the same. Thus, it
doesn't matter at all how you enter or view it, and it's completely
up to the user. The CPU understands both *exactly* the same way,
since they are the same: voltage levels at signal lines.
if I knew what the intervening numbers meant I could.
(*You* wrote the above. Please don't drop quoting headers if you
quote this deep.)
Regards,
Björn