B
BartC
Maxwell Bernstein said:On 2/20/14, 3:51 AM, BartC wrote:OK, I've just looked at your project. I assume you're referring to this:
typedef union carp_argument {
unsigned int i;
carp_register r;
long long ll;
char s[CARP_NAME_LENGTH];
} carp_argument;
typedef struct carp_command {
carp_instruction instr;
carp_argument args[CARP_NUM_ARGS];
} carp_command;
That's fine if it's what you want to do. Provided you realise that the
size of the carp_argument union will be that of the largest item, in
this case the 32-byte name 's'.
And since you have 3 of these in carp_command, that each instruction
will be about 100 bytes long.
I know I said, since originally each command I think was packed into a
16-bit value, that it wasn't necessarily to pack things that tightly,
you could make life a bit easier. But this is going to the other extreme!
I see what you mean; things are rather large.
But also, this is now unlike any real machine. I'm not clear what 's' is
for, but references to strings are usually handled by a pointer to a
region of memory that contains the string. (And if 's' is meant to be
the name of a variable, then in a real machine, names aren't used at
all, they are symbols in the source code (of assembler etc) that are
replaced in the machine code by addresses or offsets of the area in
memory where the variable resides.)
How would you recommend I deal with pointers? How would that be different?
For strings and names, you can just use char *s instead of char s[...]. Then
the size of that pointer will be (most likely with a 32-bit compiler) 4
bytes. The name will then be stored elsewhere. But the initialisation of the
string can be the same. (With unions, it's a bit uncertain how the thing is
initialised anyway; I think it uses the type of the first member of the
union.)
But being a VM, it can implement strings and names as it likes, including
using, instead, an index into a table of names (a table of char* in
reality). Then you don't really need 's', but can just use 'i'. A char* will
do however.
(Since this doesn't appear to relate to a real machine any more, I'll
briefly describe a VM I use, which implements the byte-code of a language.
The bytecode is represented as a linear array of int values, and might look
something like this:
C C X C C X Y C X Y Z ....
Where C is a command (an opcode), and X, Y and Z are the first, second and
third operands. So 'instructions' are variable length, but each part fits
into a 32-bit int value.
Each operand can be one of several kinds: when it is a 32-bit int, then it
is directly stored in X, Y or Z. Otherwise it will be an index into a table,
or sometimes even a pointer direct to a variable for example.
For dealing with the different interpretation of X, Y or Z, then C casts are
used as necessary (rather than unions; but if using pointers, you need to be
sure they will fit into an int! So indices are the best bet.)
The handler of each C opcode will of course need to know what type each
operand is; also how many operands there are, to allow it to properly step
the PC to the next bytecode. This is common also in real processors where
instructions are of varying length.)