Serialization library, request for feedback

B

BGB

I wrote a vary basic ASN.1 decoder. It didn't seem that hard...

yeah, not very hard, and not very notable either...


ASN.1 BER is a moderately compact TLV format.
but, then again, so is Matroska MKV.


it can be beaten out by more specialized formats regarding compactness.
many other designs also need not use a schema, which can be an arbitrary
limitation for a file-format.

neither allows stream resynchronization AFAIK (basically, like what is
commonly done in things like MPEG streams).


a more compact format allowing for resynchronization and not requiring a
schema can be created by borrowing much of its design from
data-compression formats.

but, sadly, some people seem to think of data-compression techniques as
bordering on magic...


a sad thing here is that many protocol designers recently have taken to
a strategy of being like:
we will take a very inefficient representation (like textual XML), and
feed it through Deflate, and it will be "compact".

it works, sort of, but is far from optimal in many areas (if a person
knows what is going on internally), and why, for example, Deflate isn't
actually very good for streaming data (its design is much better suited
for what it was originally designed for: file compression).

but, people are lazy, and XML+Deflate is often an "easy" solution.
(and, presumably for them, EXI was created...).


but, I guess in the minds of many people, designing a protocol (or
file-format) based directly on an entropy-coded bit-stream likely seems
like pure evil.

then again, even in the design of my own protocol, there were a few
things I would probably do differently now (mostly small things, like
related to how the escape markers work, ...).


but, I did what was in my case (more or less) the path of least effort.
like, why did I choose a list-based high-level representation rather
than XML?...

mostly because lists are (in my case) considerably higher-performance,
and also easier to work with (composing and processing XML data is
considerably more of a pain than doing ad-hoc "Lisp in C" stuff).

I originally considered using XML, but the awkwardness and relative
inefficiency of using a DOM-like API for composing/processing messages
was discouraging, so I opted for using lists instead (even if, sadly,
they are some sort of "arcane" technology).


granted though, this system is not based on any sort of "data binding",
as in-general, I am not really that big into data-binding (data-binding
has its own sets of drawbacks).

so, basically, explicit code walks the scene-graph and generates the
messages, and explicit code on the other end processes the messages...

(the lists still exist in both single and multiplayer games, but the
actual compression and sockets are usually only used for multiplayer games).


or such...
 
G

Greg Martin

yeah, not very hard, and not very notable either...


ASN.1 BER is a moderately compact TLV format.
but, then again, so is Matroska MKV.


it can be beaten out by more specialized formats regarding compactness.
many other designs also need not use a schema, which can be an arbitrary
limitation for a file-format.

I found that to be the real issue with ASN.1. The schema's I was working
to were complex and deeply layered and, sometimes, poorly documented.
neither allows stream resynchronization AFAIK (basically, like what is
commonly done in things like MPEG streams).


a more compact format allowing for resynchronization and not requiring a
schema can be created by borrowing much of its design from
data-compression formats.

but, sadly, some people seem to think of data-compression techniques as
bordering on magic...


a sad thing here is that many protocol designers recently have taken to
a strategy of being like:
we will take a very inefficient representation (like textual XML), and
feed it through Deflate, and it will be "compact".

While I think that XML is too verbose, JSON is easy to understand, debug
and work with. Viewing XML as anything other then markup leads to
nastiness like SOAP.

I prefer to see the compression decoupled from the protocol. It makes
debugging and decoding simpler when a stream can be captured and read as
text. YMMV.
 
B

BGB

I found that to be the real issue with ASN.1. The schema's I was working
to were complex and deeply layered and, sometimes, poorly documented.

yeah, pretty much...

While I think that XML is too verbose, JSON is easy to understand, debug
and work with. Viewing XML as anything other then markup leads to
nastiness like SOAP.

XML is fairly commonly used as a data-storage format though.
hence, things like SOAP are hardly uncommon.

not really that SOAP is "actually good"... (people just use it for
whatever reason).

I prefer to see the compression decoupled from the protocol. It makes
debugging and decoding simpler when a stream can be captured and read as
text. YMMV.

the problem is that this "decoupling" often does considerable damage to
the compression, such as making the compressed data take around 3x to 4x
more space...

granted, this works in many use cases, "well, all the users have fast
internet connections...", but still isn't necessarily ideal (nor
necessarily applicable to domains where it "actually matters").

for example, many "web" protocols just wouldn't work for many types of
online gaming... the latency and performance would be unusably bad...


I had argued about it before: it wasn't that the other person thought
SOAP could handle online gaming tasks, but that the person thought that
the sorts of things commonly done by online gaming were impossible over
the internet... (it can be done, just people can't design their
protocols to work like SOAP).


it is roughly along the same lines of implementing a video codec by
simply taking raw PPM forms of each video frame and running them through
GZip.
it could technically work, but the bit-rate would suck...

(people could then despair as it requires around 100Mbps or so for to
watch low-resolution web videos...).


(then I just had a thought of if someone thought it a good idea to make
something like an XML-based analogue of PPM):
<xppm xmlns="...">
<head>
<image width="352" height="288" bits="8"/>
</head>
<body>
<pixel red="134" green="220" blue="76" alpha="255"/>
...
</body>
</xppm>

( then web-people jump on it, because they then-to can have a
cameraphone quality photographic image require upwards of 200MB, and
maybe deflate to somewhere around 40MB... )


a problem are with text+Deflate is that it can't really recognize or
efficiently handle most of the data-types being transmitted (seeing them
as raw text), and often a partial or full state-flush is involved for
sending each message. this hurts because some things have to be sent for
every message (such as the Huffman tables), and (in the full-flush
case), any context from prior messages will be forgotten when sending
the next message (all this may be unavoidable, say, if using UDP, but is
a waste with TCP).


in my case, the contents of the message stream can be dumped textually,
but mostly via printing out the S-Expressions. granted, yes, this isn't
exactly "what goes over the wire".

the protocol then is mostly just a specialized compressed binary
serialization of the data (dynamically-typed / Lisp-style lists).

it could be better, but could also be worse...


side note:
http://en.wikipedia.org/wiki/S-expression

http://en.wikipedia.org/wiki/Lisp_(programming_language)#Conses_and_lists


even lists were a compromise:
I was lazy, so I used lists as an intermediate step.

an alternative would have been having the code for generating deltas
write into the bitstream directly, more like in Quake3 and Doom3, but
this would be ugly IMO (and, not really necessary, as dailup is dead...).

so, I chose instead to generate lists as the intermediate step, and then
encode these lists.

so, yeah, there is at least a layer of abstraction between the
"high-level protocol" and the serialization (the serialization layer
doesn't really know/care what it is sending, just that it is lists).

technically, it can send other types of data as well, so long as they
are wrapped up in lists (and represented with the appropriate data-types).


or such...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,077
Messages
2,570,567
Members
47,203
Latest member
EmmaSwank1

Latest Threads

Top