At one point I did a lot of work with ASN.1, which is a specification
for defining protocols but is implemented using a variety of encoding
schemes. It actually is an effective means of serializing data and can
be very compact and, as I recall, extended to covering most data
encodings that I ran into but it never seemed trivial to me. Debugging a
stream of encoded data can lead to premature blindness and post
traumatic drunkenness. Well, maybe that was just me. To interpret the
data correctly you need the specification for encoding, BER for example,
and also the ASN.1 specification.
ASN.1 BER isn't particularly compact.
it is about the same as my "typical" binary serialization format, where
both will typically require around 2-bytes for a marker, and 0 or more
bytes for payload data.
granted, yes, it is much more compact than more naive binary formats, or
simpler TLV formats like IFF or RIFF, but, either way...
ASN.1 PER is a bit more compact.
basically, it maps data to fixed-size bit-fields.
part of the "proof of concept" part of my "BSXRP" protocol was that it
is possible to generate a message stream on-average more compact than
ASN.1 PER, and without needing to make use of a dedicated schema. they
work in different ways though, whereas PER uses fixed bit-packing, my
protocol borrowed more heavily from Deflate and JPEG, basically working
to reduce the data mostly to near-0 integer values (via "prediction"),
which can be relatively efficiently coded via a VLC scheme (a
Huffman-coded value followed by 0 or more "extra bits").
say, one can encode one of:
a VLC value indicating which "prediction" is correct (regarding the next
item);
a VLC value indicating an item within an MRU-list of "recently seen items";
a direct representation of the item in question, in the case where it
hasn't been seen before, or was "forgotten" (fell off the end of the MRU).
Huffman coding these choice-values will tend to use a near optimal
number of bits (vs a fixed-field encoding, which will tend to allow
every possibility with a near equal weight regardless of the relative
probability of a given choice).
say, there are 6 possible choices.
a fixed-width encoding would use 3 bits here.
but, what if one choice is much more common than another:
000 = 90%
001 = 8%
others = 2%
then, 3 bits are no longer optimal, and instead we might want something
like:
000 -> 0
001 -> 10
others -> 11xxx
now, 90% of the time, we only need 1 bit.
this is part of the advantage of Huffman coding.
note that an Arithmetic coder can compress better than Huffman coding
(because it can deal with "fractional bits"), but tends to be a little
slower to encode/decode (note, however, that the H.264 video-codec is
built around an arithmetic coder).
in my case things like strings and byte-arrays are compressed using an
LZ77 scheme very similar to Deflate (but using a 64kB window rather than
a 32kB one), and also does not need to resend the Huffman table with
each message.
personally though, I don't really like ASN.1, mostly on the grounds that
its design requires a schema in order to work correctly, and I much more
prefer formats which will handle "whatever you throw at them" even
without any sort of schema.
Lately I've been embedding the V8 engine in an application. I haven't
looked at how they've implemented the JSON objects but it might be worth
a peek for anyone interested in serialization techniques. I'm very
impressed with the V8 JavaScript engine.
yeah, V8 is probably fairly good...
I looked at it before, but in my case decided to stick with my own
Scripting-VM (the VM is sufficiently tightly integrated with my project,
that it isn't really an "easily replaceable" component).
both have it in common though that they run ECMAScript / JavaScript
variants though (but, have different language extensions...).
but, mine has a fancier FFI, albeit worse performance...
but, it is easier to try to optimize things as-needed, than try to rip
out and rewrite around 2/3's of the 3D engine's codebase.
or such...