new here, xml, ebml, iff, and binary xml

C

cr88192

I am new here, so I appologize if I am trolling.
I get to a point eventually I think.

I guess I am requesting oppinion on the general idea more than anything
else.


XML:
for quite a while, I had become complacent with use of textual xml in my
projects, after all:
it is easy to type;
it is easy to read;
it is a rather capable encoding;
....

xml, however, has some limitations:
it is a little bulky in some cases;
it is not well suited to storing large chunks of binary data;
it would not be possible to access it randomly (or at least not without
parsing it first, or I am missing something signifigant);
....

(eg: so I wouldn't want to, say, use xml as a container for, say, several GB
of video).


IFF:
and for other things, there are other formats, eg: iff and riff, which:
are fairly simple (and differ only really in number endianess);
work well at dealing with chunks of binary data (eg: riff is used as the
base of both the avi and wav format).

http://www.szonye.com/bradd/iff.html
riff is similar, just the endianess is different, 'FORM' is 'RIFF', ' ' is
'JUNK', ...

for some things however, riff and iff were showing limitations:
they waste space with small items;
there is an inherit 4GB filesize limit;
they are not very expressive;
tags have a fixed size of 4 chars, which is lame imo;
....

I had before designed a kludged over variant of riff, which didn't offer
that much new and was kind of ugly.


EBML:
http://www.matroska.org/technical/specs/index.html

ebml is used as the basis of the matroska format (mka, mkv). on the site
describing it, it compares itself to xml (but is in most ways similar to
riff).
it's tags and sizes are variable length, sort of fixing some issues with
riff.
it is, however, not that much like xml.


Binary XML:
this has manifested itself in a few forms, one of the most popular is wbxml:
http://www.w3.org/TR/wbxml/
which demonstrates the possibility of binary xml as a hacked together mess
and has been used in both arguments for and against binary xml.
imo, wbxml is an example of a bad approach.

other ideas I have heard stated involve use of ASN.1 and schemas as a basis
for binary xml. however, I will argue that this line of approach likely has
a limited application domain.

there are possible good points, however, to binary xml encodings:
large datasets in a single file;
random access;
possible uses of binary xml in domains where textual xml is currently not
very suitible;
....


ok, so I have gathered some ideas, and came up with something that sort of
borrows pieces from iff, ebml (rough structure and variable length numbers),
xml (namespaces, attributes, ...), and wbxml (use of dictionaries, albeit
mine may be dynamically constructed and don't have an arbitrary size limit,
....).

it is being designed such that it can be used both like formats like
riff/ebml, and can also represent xml (a subset, eg, the basic syntax and
namespaces). namespaces are an important feature in dealing with binary data
types and mixing xml and data, or mixing different kinds of data.


at present I don't have either a version of the spec online or any working
code for that matter.
I can followup with the draft spec if anyone cares. for now I am regarding
it more as a "proof of concept" (if that).

or such...
 
O

Olivier Dubuisson

cr88192 said:
Binary XML:
this has manifested itself in a few forms, one of the most popular is wbxml:
http://www.w3.org/TR/wbxml/
which demonstrates the possibility of binary xml as a hacked together mess
and has been used in both arguments for and against binary xml.
imo, wbxml is an example of a bad approach.

other ideas I have heard stated involve use of ASN.1 and schemas as a basis
for binary xml. however, I will argue that this line of approach likely has
a limited application domain.

More information on the "ASN.1 for XML" issue can be found there:
http:/:asn1.elibel.tm.fr/xml/

O. Dubuisson
 
C

cr88192

Olivier Dubuisson said:
More information on the "ASN.1 for XML" issue can be found there:
http:/:asn1.elibel.tm.fr/xml/
yes, same as typically, not looking that compelling or very interesting from
my point of view (I am imagining something similar to xml being adapted for
something different).

within the core xml areas, I don't care as much (this is where the core
people live). fringe stuff seems interesting, and possibly, eg, a format
could offer some advantages, eg, over formats like riff or ebml for things
like video, store images, 3d models, ...

the idea may well be overkill, and the design is becomming complicated. it
may be necessary to narrow the scope and reduce the amount of "variable"
stuff (eg: fixed dictionary rules, plat-only structure, ...).


I guess for everyone their own format.
 
C

cr88192

to try to summerize some.

my guess is that:
I think that it will be cool to have a data container format with comprable
flexibility to xml (most typically fall short here);
I was likely overemphasizing binary xml originally, I am now thinking this
is more a distraction than anything practical.

the description is thus more like "xml-like binary container format" than
"binary xml", oh well...


thoughts now drift to other possible schemes, eg:
dropping the tag attribute disctinction and making all tags effectively both
compound and primitive globs (or either compond or primitive).
another mystery is whether to allow compound attributes.

however, with this the impulse then becomes to take the most
streightforwards approach, leading to a simple inflexible plain tree again
(eg: the attributes would become the nodes and no longer a form of metadata,
and there is no longer a really safe and general way to insert metadata
without possibly effecting content).

I guess an aspect of an xml like structure is that it is slightly awkward
and limited in such a way as to promote flexible design over a more direct
and elegant but less flexible design (good examples seem elusive though).

I think it may be similar to something I have noticed with programming
languages and similar. a resistance to being overly clean can lead to a
flexible design, but too much leads to an inflexible mess. often, with the
inflexible mess it seems like, at the core, someone was kludging over some
central design.
(freedom is gained on the basis of adherence to rules).

an xml-like approach seems most sensible.
ntlalv: namespace, tag, length, attributes, length, value.
better than tlv?...
what about just ntlv?...

it seems fragile and tweaking too much with the nature of attributes causes
things to collapse.
(semantics and structure seem conected in a weird way...).

similar goes with the coding and the api, why am I taking so long?...
the nature of the api is odd, apart from a context, I am not using much
other data. it forms itself as an odd state machine...


ok, my point once again seems lost.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,816
Latest member
nipsseyhussle

Latest Threads

Top