C
cr88192
I am new here, so I appologize if I am trolling.
I get to a point eventually I think.
I guess I am requesting oppinion on the general idea more than anything
else.
XML:
for quite a while, I had become complacent with use of textual xml in my
projects, after all:
it is easy to type;
it is easy to read;
it is a rather capable encoding;
....
xml, however, has some limitations:
it is a little bulky in some cases;
it is not well suited to storing large chunks of binary data;
it would not be possible to access it randomly (or at least not without
parsing it first, or I am missing something signifigant);
....
(eg: so I wouldn't want to, say, use xml as a container for, say, several GB
of video).
IFF:
and for other things, there are other formats, eg: iff and riff, which:
are fairly simple (and differ only really in number endianess);
work well at dealing with chunks of binary data (eg: riff is used as the
base of both the avi and wav format).
http://www.szonye.com/bradd/iff.html
riff is similar, just the endianess is different, 'FORM' is 'RIFF', ' ' is
'JUNK', ...
for some things however, riff and iff were showing limitations:
they waste space with small items;
there is an inherit 4GB filesize limit;
they are not very expressive;
tags have a fixed size of 4 chars, which is lame imo;
....
I had before designed a kludged over variant of riff, which didn't offer
that much new and was kind of ugly.
EBML:
http://www.matroska.org/technical/specs/index.html
ebml is used as the basis of the matroska format (mka, mkv). on the site
describing it, it compares itself to xml (but is in most ways similar to
riff).
it's tags and sizes are variable length, sort of fixing some issues with
riff.
it is, however, not that much like xml.
Binary XML:
this has manifested itself in a few forms, one of the most popular is wbxml:
http://www.w3.org/TR/wbxml/
which demonstrates the possibility of binary xml as a hacked together mess
and has been used in both arguments for and against binary xml.
imo, wbxml is an example of a bad approach.
other ideas I have heard stated involve use of ASN.1 and schemas as a basis
for binary xml. however, I will argue that this line of approach likely has
a limited application domain.
there are possible good points, however, to binary xml encodings:
large datasets in a single file;
random access;
possible uses of binary xml in domains where textual xml is currently not
very suitible;
....
ok, so I have gathered some ideas, and came up with something that sort of
borrows pieces from iff, ebml (rough structure and variable length numbers),
xml (namespaces, attributes, ...), and wbxml (use of dictionaries, albeit
mine may be dynamically constructed and don't have an arbitrary size limit,
....).
it is being designed such that it can be used both like formats like
riff/ebml, and can also represent xml (a subset, eg, the basic syntax and
namespaces). namespaces are an important feature in dealing with binary data
types and mixing xml and data, or mixing different kinds of data.
at present I don't have either a version of the spec online or any working
code for that matter.
I can followup with the draft spec if anyone cares. for now I am regarding
it more as a "proof of concept" (if that).
or such...
I get to a point eventually I think.
I guess I am requesting oppinion on the general idea more than anything
else.
XML:
for quite a while, I had become complacent with use of textual xml in my
projects, after all:
it is easy to type;
it is easy to read;
it is a rather capable encoding;
....
xml, however, has some limitations:
it is a little bulky in some cases;
it is not well suited to storing large chunks of binary data;
it would not be possible to access it randomly (or at least not without
parsing it first, or I am missing something signifigant);
....
(eg: so I wouldn't want to, say, use xml as a container for, say, several GB
of video).
IFF:
and for other things, there are other formats, eg: iff and riff, which:
are fairly simple (and differ only really in number endianess);
work well at dealing with chunks of binary data (eg: riff is used as the
base of both the avi and wav format).
http://www.szonye.com/bradd/iff.html
riff is similar, just the endianess is different, 'FORM' is 'RIFF', ' ' is
'JUNK', ...
for some things however, riff and iff were showing limitations:
they waste space with small items;
there is an inherit 4GB filesize limit;
they are not very expressive;
tags have a fixed size of 4 chars, which is lame imo;
....
I had before designed a kludged over variant of riff, which didn't offer
that much new and was kind of ugly.
EBML:
http://www.matroska.org/technical/specs/index.html
ebml is used as the basis of the matroska format (mka, mkv). on the site
describing it, it compares itself to xml (but is in most ways similar to
riff).
it's tags and sizes are variable length, sort of fixing some issues with
riff.
it is, however, not that much like xml.
Binary XML:
this has manifested itself in a few forms, one of the most popular is wbxml:
http://www.w3.org/TR/wbxml/
which demonstrates the possibility of binary xml as a hacked together mess
and has been used in both arguments for and against binary xml.
imo, wbxml is an example of a bad approach.
other ideas I have heard stated involve use of ASN.1 and schemas as a basis
for binary xml. however, I will argue that this line of approach likely has
a limited application domain.
there are possible good points, however, to binary xml encodings:
large datasets in a single file;
random access;
possible uses of binary xml in domains where textual xml is currently not
very suitible;
....
ok, so I have gathered some ideas, and came up with something that sort of
borrows pieces from iff, ebml (rough structure and variable length numbers),
xml (namespaces, attributes, ...), and wbxml (use of dictionaries, albeit
mine may be dynamically constructed and don't have an arbitrary size limit,
....).
it is being designed such that it can be used both like formats like
riff/ebml, and can also represent xml (a subset, eg, the basic syntax and
namespaces). namespaces are an important feature in dealing with binary data
types and mixing xml and data, or mixing different kinds of data.
at present I don't have either a version of the spec online or any working
code for that matter.
I can followup with the draft spec if anyone cares. for now I am regarding
it more as a "proof of concept" (if that).
or such...