OT: custom VM, partly JBC based

B

BGB / cr88192

well, just seeing if anyone here has comments.


I have a custom VM (at this point, as a personal/hobby project), and to some
extent I am using JBC (Java ByteCode) as one of the available bytecode
formats. several others are used as well, as well as a lot of raw native
code (x86/x86-64 machine code), at the moment there is no true "one true
bytecode" in this VM. this VM presently deals with multiple languages as
well (the main ones in use are C and a language roughly similar to
JavaScript and ActionScript, and largely implementing the "ECMA-262 5th
edition" standard).

partial compilers for Java and C# exist, although neither is currently
usably complete (hence currently Java is being compiled via Eclipse / ECJ).
these compilers mostly have the frontend in place, are incomplete WRT
backend functionality (I was previously using a native-code backend,
originally designed for C, but there are a few issues in the mix WRT getting
all 3 to share the same backend).

oh yes, and the libraries currently implement a subset of the JDK 1.1 spec
(although I left out AWT and a few other things), yes, suck, I know...
errm... partly this is because exceptions don't currently work either (never
got around to implementing them for this part of the VM, but hope to
implement them eventually).


so, mostly due to trying to make the bytecode more useful for my uses (IOW:
using it for several different languages), I am using a few
alterations/extensions:

the "wide" opcode/prefix may be used with additional opcodes, mostly to
extend constant-pool indices to 32 bits.

opcode bytes 224..253 are used for escaping longer opcodes (currently
224..239 are used to extend the opcode space up to 4096 opcodes, this design
was implemented around 1 year ago).
most opcodes > 256 are extensions specific to my VM (mostly for working with
pointers and dynamic types related stuff).

240..253 may be used to make the space bigger still (~916k opcodes), but at
present I shouldn't need more than the present limit of 4096.

this was done rather than using the impdep1/impdep2 mechanism because, if my
past experience is any indication, I may easily end up with a largish number
of added opcodes (or, at least > 256 new ones), and the impdep1/2 mechanism
makes instruction-set extensions comparatively more expensive (since it more
quickly leads to 3 or 4 byte opcodes, for what would have been 2/3 bytes for
a different encoding).

also impdep1/impdep2 aren't valid in class files anyways, so really it
doesn't matter what exact means it used to extend the bytecode.

currently AFAICT Sun/Oracle doesn't use any opcodes in this range, but if
later added (and I am still importing ".class" files at this point) then
they will likely be remapped on import (likely to 2-byte forms of the same
opcode numbers or similar, sort of like the NUL-escape trick in UTF-8).

however, all this is not presently "set in stone" or anything...


a new container format is also under design/consideration, which differs
from class files in a few ways:
it is image-based rather than serialized, and may support being embedded in
PE/COFF and ELF and similar(intended for mixed-code images);
the constant pool is represented differently and may be larger than 64k
entries in larger images (hence the overloaded 'wide' prefix);
I am likely to use a different signature notation from class files (my VM
internally uses a different notation for signatures, and currently
transcribes between them);
any number of classes, fields, and methods may exist, as well as namespaces,
and the ability for variables and functions to exist which are attached
directly to namespaces or to the toplevel (like in C and C++).

it may be produced by "linking" a number of class files, and would be
intended as an alternative to JAR (albeit non-portable, and probably limited
mostly to non-Java code and mixed-images or similar, or for core VM
libraries).

when using the new container, the bytecode will be renamed (JXBC is a
possible name), although it will still be (mostly) backwards compatible.
(another alternative would be producing non-standard class files, but this
was preferably avoided, as ideally class files are kept portable).

note that most other parts of the VM use COFF-based containers, which I had
also partly designed/implemented previously in this case (some
class-file-like structures shoved into a COFF object), but figured this was
ugly and I changed the idea for using a more self-contained format.

oh yes, also non-standard mechanisms exist for inter-language interop,
although JNI is also still an available option (alternatives to JNI are used
as JNI is awkward...).


dunno if there would be any worry of annoyance from Oracle or similar...
(I don't know, they probably wont go after individuals and hobbyists?...).

probably not like anything like this would be much competition anyways.


any comments?...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,150
Members
46,697
Latest member
AugustNabo

Latest Threads

Top