Tiny VM in C

R

Rick C. Hodgin

I also posted my Debi disassembler ported to C/C++ using the CodeLite IDE
by Eran Ifrah. I haven't tested it in a few years, but so far as I can
remember it worked correctly on all instructions I supported. You can see
the opcode map there as well, for the one-byte, and two-byte instructions.

See stand-alone DebiX86Disassembler:
https://github.com/RickCHodgin/libsf/tree/master/unsorted

The format it generates is similar to that which you see in the Debi
debugger video (IIRC).

Best regards,
Rick C. Hodgin
 
B

BartC

IME, for bytecode, stack machine designs are fairly popular and pure
interpreters for them are generally simpler to implement (and generate
code for).

if performance is more the goal (for a plain C interpreter), or the design
is more focused on JIT compilation (*), a "register machine" design can
offer some advantages, though granted in many regards is a little more
complicated (some added complexity in terms of the ISA design and compiler
logic).


*: a register machine can allow better performance for a simple/naive JIT.

it doesn't matter as much for a more advanced JIT, but then basically one
is climbing great walls of complexity for slight gains in performance.

the "easy case" here is basically to use a statically-typed register
machine as the IR, at which point one can produce output from a naive JIT
roughly on parity with unoptimized C.

IME, a naive JIT for a stack-machine will be about 2x-3x slower than
unoptimized C.


now as for dynamic types... well, this is its own big-complex issues.
simple answer: actually using dynamic types at run-time: very expensive.
so, there tends to be some amount of taking dynamically-typed input
languages and compiling them into a statically-typed form at the IR level.

though, for simple if albeit slow languages and interpreters, dynamic
types can work well.

I'm developing an interpreter for a stack-based bytecode language, which
uses dynamic types. It's a simple interpreter with no JIT, but manages about
4-5x slower than un-optimised C code. This is for simple numeric benchmarks;
with anything more elaborate, the difference seems to narrow (but I haven't
yet got anything significant to test can be directly compared).

However, to get this I needed to use optimised C in the interpreter plus
some non-portable features (ASM for example), otherwise it might be nearer
10x slower in standard (but still optimised) C.

Of course for real applications, full of string processing, file-handling
and library calls, the difference between executing this bytecode and even
the optimised C equivalent, is generally not significant (it takes 150 msec
or so to compile the largest source file I have; C could do it in 0 msec,
but I wouldn't really notice the difference!)
 
G

glen herrmannsfeldt

Rick C. Hodgin said:
Absolutely. I almost suggested this idea, but I really wasn't too keen
on the idea of emulating x86. I've been surprised Intel or AMD hasn't
released a version of their x86 architecture at some point which is only a
flat 32-bit protected mode chip (boots up in protected mode, cannot do
virtual x86, real-mode, smm, no hypervisor support, or anything else),
or something which operates more similarly to the original 80x86 with
segment selectors, allowing for a segmented model which easily addresses
far more memory than 32-bits.

Way too late now, but in the days of IA32 processors with 36 bit
physical addressing, but only a 32 bit MMU, it would not have taken
much extra to allow good use for the 36 bits.

What was missing from the 80286 and later was a segment descriptor
cache, which would have sped up access to multiple segments.
Large model isn't all that hard to use, especially with 32 bit
offsets, and could have allowed many more years of 32 bit processors.
Even now, with the popularity of 64 bits, systems with more than
64GB RAM are extrememly rare.

But the value of "64" on the outside is much more than the actual
need for 64 bit addressing in selling chips.

-- glen
 
B

BGB

I'm developing an interpreter for a stack-based bytecode language, which
uses dynamic types. It's a simple interpreter with no JIT, but manages
about
4-5x slower than un-optimised C code. This is for simple numeric
benchmarks;
with anything more elaborate, the difference seems to narrow (but I haven't
yet got anything significant to test can be directly compared).

However, to get this I needed to use optimised C in the interpreter plus
some non-portable features (ASM for example), otherwise it might be nearer
10x slower in standard (but still optimised) C.

my faster plain interpreters are typically still around 10x slower than
native, but this is generally with a plain C interpreter (usually, I am
doing an interpreter for the portable backend, with a JIT on certain
targets to make things a little faster).


most of my recent JITs tend to produce a mix of threaded code (function
calls) and short segments of direct machine code (mostly using
prefabricated ASM code globs), hence, "naive".

while fancier JITs are possible, they are a lot more complicated and a
lot more painful to maintain and debug (especially if multiple targets
are involved).


a lot of my benchmarks had been things like running sorting algorithms
and similar (like sorting largish arrays of numbers, ...).

as well as some numbers of micro-benchmarks, ... (calling functions or
methods in a loop, ...).


for things like array-sorting and similar, it is about 2x-3x slower than
C, though there are other factors which could influence things (32-bit
raw pointers and no bounds-checks vs 64-bit array references and
bounds-checked arrays, ...), in addition to things like extra operations
related to variable load/store and stack-management stuff.

cross-language function calls are also a bit expensive due to various
factors (mostly involving argument marshaling and so on).


but, anyways, consider you have something like: "y=m*x+b;"

in a stack-based IL, this might look like:
LLOAD_I m
LLOAD_I x
MUL_I
LLOAD_I b
ADD_I
LSTORE_I y

whereas a register IR could do:
MUL_I t0, m, x
ADD_I y, t0, b

and, comparatively, a naive JIT could produce fewer instructions for the
register case, ...


for example, naive stack-JIT output:
mov eax, [ebp-12] ;LLOAD_I
mov [esp+4], eax
mov eax, [ebp-16] ;LLOAD_I
mov [esp+0], eax
mov eax, [esp+4] ;MUL_I
mov ecx, [esp+0]
imul eax, ecx
mov [esp+4], eax
mov eax, [ebp-8] ;LLOAD_I
mov [esp+0], eax
mov eax, [esp+4] ;ADD_I
mov ecx, [esp+0]
add eax, ecx
mov [esp+4], eax
mov eax, [esp+4] ;LSTORE_I
mov [ebp-20], eax

vs, naive register-JIT output:
mov eax, [ebp-12] ;MUL_I
mov ecx, [ebp-16]
imul eax, ecx
mov [ebp-24], eax
mov eax, [ebp-24] ;ADD_I
mov ecx, [ebp-8]
add eax, ecx
mov [ebp-20], eax


these differences largely disappear if the JIT is smart enough to use a
register allocator and peephole optimizer, but assumed here is a JIT
that is effectively too stupid to use these.


though, granted, a person can be "clever" and squeeze a little more
speed out of the stack-machine, say, via adding magic like:
LL2_MUL_I m, x ;load and multiply m and x, pushing result
LL_ADD_I b ;add b to top-of-stack
LSTORE_I y ;store into y

which can make things a little faster, mostly at the cost of adding
large numbers of special cases.

this works, but makes things a lot more hairy.

Of course for real applications, full of string processing, file-handling
and library calls, the difference between executing this bytecode and even
the optimised C equivalent, is generally not significant (it takes 150 msec
or so to compile the largest source file I have; C could do it in 0
msec, but I wouldn't really notice the difference!)

yeah.

it depends on what one is doing.


performance isn't really a huge issue at present for using it mostly for
things like game logic tasks (enemy AIs, ...) and some UI stuff, and
things like 2D and 3D animation tasks, ...

mostly this consists of function calls, messing with objects and arrays,
some amount of vector math, ...

the present form of the language more-or-less resembles ActionScript3
(mostly similar syntax and semantics, ...).


its current backend is a (mostly) statically-typed stack-machine (it
started out dynamically-typed, but static types were retrofitted onto it
afterwards).

a few efforts had been considered to move the main script language/VM
over to a register-based fully-statically-typed backend, but not enough
effort has been made on this front to actually go somewhere with this.



a place where speed started looking to be an issue though was for
possibly using script-code for video processing, where things have to be
pretty fast when manipulating pixels not to drive the performance into
the ground.

so, there was some effort to try to design/implement a backend that
could hopefully be C-competitive for this use-case (built around a
vector-oriented register IR).

however, this fizzled out some, and most of this sort of code remains as
plain C (generally compiled with optimizations, as well as
micro-optimized, and using SSE intrinsics, ...), mostly as my stuff
still tends to fall a bit behind C here, and even then C has a hard-time
keeping up (such as running chroma-key filters and compositing multiple
video-streams and re-encoding to another output stream, where doing this
CPU-side tends to quickly go sub-real-time, ...).

for real-time, mostly it ends up boiling down mostly to shoving
everything over to the GPU and doing much of the rest of the "real work"
via fragment shaders (IOW: moderately fast codecs and a lot of PBOs and
similar, as per-pixel filtering / image scaling / blending / ... is just
so much faster on the GPU).

it started looking a bit like there was little hope of making the
script-code performance competitive within any sane level of effort.


this basically means a lot of performance-sensitive stuff (image/video
processing, A/V codecs, 3D rendering, ...) mostly still needs to remain
as C (with most real-time per-pixel stuff and effects being done on the
GPU).

nevermind being like:
var fragShader:string="""
uniform sample2D texFoo;
uniform sample2D texBar;
...
""";

but this isn't quite the same thing...


or such...
 
B

BartC

but, anyways, consider you have something like: "y=m*x+b;"

in a stack-based IL, this might look like:
LLOAD_I m
LLOAD_I x
MUL_I
LLOAD_I b
ADD_I
LSTORE_I y

whereas a register IR could do:
MUL_I t0, m, x
ADD_I y, t0, b

In this simple case; with more complex expressions, function calls (where
arguments have to be stacked anyway) the stack model will take everything in
its stride and is very simple to generate code for.

But I have a problem anyway because of the way my execution model works: I
only use dynamic, tagged types, I manipulate all data by moving 16-byte
descriptors around, and don't use a garbage collector. This makes a register
model impractical (where to store the register contents? they will end up
being written to memory anyway, it might as well be a stack).

Actually, this seems hopelessly inefficient on the face of it (much better
just to manipulate pointers), but I could never make a pointer model work
(without complex GC or reference counting or such). The results however
still largely outperform other dynamic languages that don't use acceleration
or advanced tracing techniques or whatever.
though, granted, a person can be "clever" and squeeze a little more speed
out of the stack-machine, say, via adding magic like:
LL2_MUL_I m, x ;load and multiply m and x, pushing result
LL_ADD_I b ;add b to top-of-stack
LSTORE_I y ;store into y

which can make things a little faster, mostly at the cost of adding large
numbers of special cases.

I've tried lots of those approaches (type-hinted source, some type
inference, a thousand bytecodes, then a few generic bytecodes that are
mapped at loadtime to whatever specific ones are implemented...). But it
always gets very complex, very hairy, and there seemed an upper limit
(always about 2x it seemed) on how much faster it made things. I just
accepted that this was a dynamic language and would be a magnitude slower
than native code, for benchmarks anyway. Clean source code and a simple
interpreter is worth something too...
its current backend is a (mostly) statically-typed stack-machine (it
started out dynamically-typed, but static types were retrofitted onto it
afterwards).

a few efforts had been considered to move the main script language/VM over
to a register-based fully-statically-typed backend, but not enough effort
has been made on this front to actually go somewhere with this.

a place where speed started looking to be an issue though was for possibly
using script-code for video processing, where things have to be pretty
fast when manipulating pixels not to drive the performance into the
ground.

so, there was some effort to try to design/implement a backend that could
hopefully be C-competitive for this use-case (built around a
vector-oriented register IR).

I'd like to have a go at creating an interpreter for a static language, just
to see what I can do with it. It will need static types but dynamic
everything else (otherwise C will do). (But today I've just finished my new
front-end to C - writing C with my own syntax - that will have to do for a
while before working on yet another language!)
 
G

glen herrmannsfeldt

(snip, I wrote)
True, but that would have been the least of the problems. Neither of
the important OS's showed any inclination to support segments. Heck,
MS made Intel adopt AMD64 rather than their own 64-bit extension, can
you imagine for a minute that with the Intel choice being expanded
segmentation, that MS's would have picked that instead of AMD64?
Remember that MS *already* had ("real") 64-bit Windows running on the
Alpha (not ever released, but that's where most of the port was done).

I thought there was an NT Alpha, though I never had one.

I did have a trial version of Win2K for IA64.
I even speculated* on whether Intel doing exactly that (enlarging the
logical address space) and shipping the usual Intel proportion of
processors could force MS to adopt the extensions. Of course those
extensions would have been trivial for AMD to copy as well.

Well, MS did support PAE, but that takes a lot of extra
work for the program.
*Probably on RWT, and if they ever get their forum search working
again, one might be able to find the thread.
I'm not sure how much of a specific segment descriptor cache would
have been needed. A reasonably fast load from L1 should have been
plenty fast enough (they were only eight bytes, after all - although
one would expect that to be enlarged with bigger logical addresses).
If you had a specific descriptor cache, you'd have coherency issues to
deal with, presumably similar to those of a TLB. The data in a
descriptor isn't highly encoded. The 286 suffered from both no cache,
and even then an oddly slow descriptor load. It took 17 clocks even
with zero wait state memory (and then would only have had to load
three words for populate the descriptor). The 486 made it relatively
worse - 9 cycles, and then pipeline stalls to get the update to
happen. Clearly Intel was not pushing performance of segment register
loads.
The two extra seg regs would have helped a lot on the 8086/286, and
might have significantly reduced the hatred of segments that was
instilled by the time the 386 came around.

I suppose, but a cache would have helped more. It isn't so
easy for a compiler to figure out which order segments
are loaded, and so assign the segment registers.
Still the OS guys hated the idea. And as Linus Torvalds has pointed
out several times, OS's actually want a couple more address bits than
the physical memory installed on a machine. Basically an OS wants to
simultaneously map: a user process, itself, all of real memory, and
all of the I/O space. Once you get pas about 1GB, the OS starts
having to do a bunch of contortions because those things are no longer
possible. Segments would have added only a few effective bits from
the OS's perspective.

But a few bits would have lasted a long time.

-- glen
 
M

Maxwell Bernstein

I wrote my first "real" program in this assembly code: sum the numbers from 1 to 100. I'm going to keep writing simple programs to figure out what I might need.
 
R

Rick C. Hodgin

I wrote my first "real" program in this assembly code: sum the numbers
from 1 to 100. I'm going to keep writing simple programs to figure out
what I might need.

Awesome. Keep up the good work. :)

Best regards,
Rick C. Hodgin
 
G

glen herrmannsfeldt

(big snip regarding addressing of Intel processors)
I'll have to disagree. The four seg regs of the 286, of which one was
read-only and dedicated to code, and another immovable because it held
the stack, meant you were constantly swapping the values in DS and ES
around in large model programs. And it got worse in things like DLLs
where you could no longer assume your DGROUP was referenced by SS.

Hmm. FS and GS came late enough that I never knew any program
to use them. I was running OS/2 1.0 in, I believe 7MB of RAM
in early 1990. That was when most people were running DOS in 640K,
maybe with a virtual disk in extended memory.
The two extra seg regs would have greatly reduced the number of
segment reloads necessary. Which is no reason not to make those
reloads as fast as possible, of course.

For the 80286, with a 16 bit data bus, it had to load the selector,
plus the whole descriptor, for each one. Since you pretty much need
to keep DS on the actual data segment, that meant loading ES for
every access. Still, much better than you could do without it.
The 286 was actually considerably worse than the 8086 - at least in
real mode you could store some common data in your code seg.

But much more fun in protected mode running OS/2.

(snip, regarding more than 32 address bits)
Assuming you'd have gotten someone to use them in the first place.
Had anyone actually used segmentation in 32 bit mode, the larger
linear address space would have been a no-brainer for Intel.

The Watcom compilers would generate large model code with 32 bit
offsets. The only OS that I know even thought about supporting
it was OS/2, but I don't know that they did. At the time I
don't know that any boards supported more than 4GB.
PAE was, of course, introduced long before AMD64 (IIRC, PAE
shipped in1995 the Opteron in 2003).

-- glen
 
B

BGB

In this simple case; with more complex expressions, function calls (where
arguments have to be stacked anyway) the stack model will take
everything in
its stride and is very simple to generate code for.

But I have a problem anyway because of the way my execution model works: I
only use dynamic, tagged types, I manipulate all data by moving 16-byte
descriptors around, and don't use a garbage collector. This makes a
register
model impractical (where to store the register contents? they will end up
being written to memory anyway, it might as well be a stack).

Actually, this seems hopelessly inefficient on the face of it (much better
just to manipulate pointers), but I could never make a pointer model work
(without complex GC or reference counting or such). The results however
still largely outperform other dynamic languages that don't use
acceleration
or advanced tracing techniques or whatever.


my case is a little more complex:
there are raw/unboxed types, such as raw integers or floats or doubles;
there are boxed types, typically a pointer to some other type (a value
or structure, ...);
there are raw pointers and box-pointers (a pointer to a structure
representing a bounds-checked pointer);
there are also "variant" types (used for dynamically-typed values),
which in this VM, are implemented as 64-bit tagged references (*).


*: tagging scheme described here:
http://cr88192.dyndns.org:8080/wiki/index.php/Tagged_references


basic type information is encoded in the bytecode, however, the VM
backend is allowed a little freedom as to how they are represented in
terms of more concrete types.

in the current JIT, generally everything still gets a 64-bit spot, and
will be converted to/from tagged references as needed (a 32-bit item
will only use the low 32-bits of the spot).


as for object lifetime:
currently I am mostly using a mix of GC and manual memory-management
(you can leave it up to the GC, or "delete" objects early);
there are also "struct" which have C# like semantics, and "value
classes" (basically, classes with more C++ like behavior, essentially
structs with inheritance and methods).

the JVM also has a few nifty tricks here (local vs global objects, ...),
but I haven't gotten around to ripping them off.

I don't currently use reference counting, as personally I have generally
found it to be "more trouble than it is worth".

I've tried lots of those approaches (type-hinted source, some type
inference, a thousand bytecodes, then a few generic bytecodes that are
mapped at loadtime to whatever specific ones are implemented...). But it
always gets very complex, very hairy, and there seemed an upper limit
(always about 2x it seemed) on how much faster it made things. I just
accepted that this was a dynamic language and would be a magnitude slower
than native code, for benchmarks anyway. Clean source code and a simple
interpreter is worth something too...

I did so, and largely just lived with a lot of this sort of hair.

this lasted until I started making some more significant changes to the
type-system, at which point a lot of these operations either got dropped
or partly repurposed (via prefixes).

but, yeah, there are still a lot of these sorts of special-case
operations in the current stack-based VM.

it is also a bit mix/match between whether the opcode is type-specific,
or encodes the type via a prefix (generally, type-specific opcodes are
used for more common operations, with prefixes for less common
operations and edge-cases).


the cost of the stack can also be partially reduced by internally
mapping stack-elements to fixed-position spots (basically, if one
imposes similar constraints onto the stack as in the JVM, *2, it makes
some things a little more straightforward and allows optimizing away a
lot of internal stack-position adjustments and similar).

this can also be used to map stack-machines onto a register-machine
backend, which has been considered a few times (there has been some
internal debate about, if moving to a fully register machine backend,
whether this would mean also moving to a new bytecode, or just map the
existing stack-based bytecode onto a register-machine as an intermediate
step).


*2: along all control paths leading to a particular label / jump-target,
it is necessary for the stack to have the same layout (same number of
elements with the same types).

I'd like to have a go at creating an interpreter for a static language,
just
to see what I can do with it. It will need static types but dynamic
everything else (otherwise C will do). (But today I've just finished my
new front-end to C - writing C with my own syntax - that will have to do
for a while before working on yet another language!)

as can be noted, the actual high-level language in question here can use
either static or dynamic types.

more the static typing is done mostly by a combination of some explicit
types, and also via type-inference, however dynamic types still remain
available as well.

for example, if a variable is declared without a type given, it is up to
the compiler whether it will exist as a dynamically-typed variable, or
be treated more like the C++ "auto" type (and converted into a
statically-typed variable of a given type).


it is mostly that if you take a language like JavaScript or similar, and
run it purely with dynamic type-checks, the raw performance will be
pretty bad. with type-inference, a lot of the types can be nailed down,
and the performance can be made a lot better (without otherwise
constraining the language semantics).


however there are limits:
cases where the type-inference logic can't figure out the types (and so
would unnecessarily use a dynamically-typed variable);
cases where an otherwise obvious type-error exists, where a
statically-typed language would be like "WTF are you doing here?!" but a
dynamically typed language would just accept it and blow up at runtime
(bad, compile-time type-errors are better here);
....

so, static type declarations can help here.

for example:
var s:string;
....
s=42;
the compiler can then safely blow up with a type error.


well, also the compiler will warn about implicit conversions to smaller
numeric types:
var i:int, s:short;
....
s=i; //warning
s=i as short; //OK


I guess to clarify:
what I meant by "fully statically typed backend" isn't so much about
fully eliminating dynamic types (from the language or VM), but rather
about statically determining all the types (meaning all cases of variant
types are statically known as well).

in the current VM it is a bit more hit or miss, so the VM is like "oh
crap, don't know the types here, better use variant." this kind of sucks
as it leaves a lot of big holes in the type-system.
 
G

glen herrmannsfeldt

(snip, I wrote)
We ended up with a fair bit of 16-bit assembler code still running on
OS/2 2.0, the two extra segment registers were *quite* handy, for just
that reason - you *weren't* reloading ES every third* instruction. We
also used that to a lesser extent in Win16. While Windows didn't save
FS and GS, it was, of course non-preemptive, so managing FS and GS
manually was largely a non-issue (obviously that code was limited to
running in 386Enh mode).
*Obvious hyperbole, but you did spend a lot of time flogging the
segment registers you did have. The two extras reduced that flogging
*a lot*.

The one I was working on did Successive OverRelaxation (SOR) for
a solution to the Poisson equation in 2D. It does do a lot of load
and store in arrays in large model, so it might be every third.

I did sometimes look at the generated code, but there wasn't so much
that I could do about it.

-- glen
 
M

Maxwell Bernstein

I am starting to represent commands (instruction+arguments) as unions in
C. That seems at first glance to be a pretty good solution. Is there
anything that I am missing?
 
B

Ben Bacarisse

Maxwell Bernstein said:
I am starting to represent commands (instruction+arguments) as unions in
C. That seems at first glance to be a pretty good solution. Is there
anything that I am missing?

Sounds fine to me. You might also want to look at bit-fields. They are
really only a space-saving measure, but the thread title suggests the
size might matter.
 
M

Maxwell Bernstein

Sounds fine to me. You might also want to look at bit-fields. They are
really only a space-saving measure, but the thread title suggests the
size might matter.

I shall take a look at bit fields. The thread title was more of a
reference to the size of the code than the memory footprint.
 
B

Ben Bacarisse

Maxwell Bernstein said:
I shall take a look at bit fields. The thread title was more of a
reference to the size of the code than the memory footprint.

Ah, well bit-fields probably just introduce unnecessary costs and
complexity. Worth knowing about them, but probably not helpful here.
 
M

Maxwell Bernstein

I shall take a look at bit fields. The thread title was more of a
Ah, well bit-fields probably just introduce unnecessary costs and
complexity. Worth knowing about them, but probably not helpful here.

Aha, okay. I'll look into it for reference. As for the VM in general:
any thoughts?
 
B

BartC

Maxwell Bernstein said:
I am starting to represent commands (instruction+arguments) as unions in
C. That seems at first glance to be a pretty good solution.

What is it a solution for?
 
B

BartC

BartC said:
What is it a solution for?

OK, I've just looked at your project. I assume you're referring to this:

typedef union carp_argument {
unsigned int i;
carp_register r;
long long ll;
char s[CARP_NAME_LENGTH];
} carp_argument;

typedef struct carp_command {
carp_instruction instr;
carp_argument args[CARP_NUM_ARGS];
} carp_command;

That's fine if it's what you want to do. Provided you realise that the size
of the carp_argument union will be that of the largest item, in this case
the 32-byte name 's'.

And since you have 3 of these in carp_command, that each instruction will be
about 100 bytes long.

I know I said, since originally each command I think was packed into a
16-bit value, that it wasn't necessarily to pack things that tightly, you
could make life a bit easier. But this is going to the other extreme!

But also, this is now unlike any real machine. I'm not clear what 's' is
for, but references to strings are usually handled by a pointer to a region
of memory that contains the string. (And if 's' is meant to be the name of a
variable, then in a real machine, names aren't used at all, they are symbols
in the source code (of assembler etc) that are replaced in the machine code
by addresses or offsets of the area in memory where the variable resides.)

(The rest of the code seems to be nicely written, however - and this is
purely my own preference - I had trouble dealing with the myriad little
headers. Fewer headers, for example one containing all the constants, enums,
and types that described how the virtual machine code is represented, would
have made navigation a bit simpler.)
 
M

Maxwell Bernstein

What is it a solution for?

It is a solution for not worrying about having multiple fields in the
command struct for arguments. Previously I had args and string_args.
 
M

Maxwell Bernstein

BartC said:
What is it a solution for?

OK, I've just looked at your project. I assume you're referring to this:

typedef union carp_argument {
unsigned int i;
carp_register r;
long long ll;
char s[CARP_NAME_LENGTH];
} carp_argument;

typedef struct carp_command {
carp_instruction instr;
carp_argument args[CARP_NUM_ARGS];
} carp_command;

That's fine if it's what you want to do. Provided you realise that the
size of the carp_argument union will be that of the largest item, in
this case the 32-byte name 's'.

And since you have 3 of these in carp_command, that each instruction
will be about 100 bytes long.

I know I said, since originally each command I think was packed into a
16-bit value, that it wasn't necessarily to pack things that tightly,
you could make life a bit easier. But this is going to the other extreme!

I see what you mean; things are rather large.
But also, this is now unlike any real machine. I'm not clear what 's' is
for, but references to strings are usually handled by a pointer to a
region of memory that contains the string. (And if 's' is meant to be
the name of a variable, then in a real machine, names aren't used at
all, they are symbols in the source code (of assembler etc) that are
replaced in the machine code by addresses or offsets of the area in
memory where the variable resides.)

How would you recommend I deal with pointers? How would that be different?
(The rest of the code seems to be nicely written, however - and this is
purely my own preference - I had trouble dealing with the myriad little
headers. Fewer headers, for example one containing all the constants,
enums, and types that described how the virtual machine code is
represented, would have made navigation a bit simpler.)
Ah, thank you. I'll look into the header issue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,554
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top