Binary XML

D

Default User

I work in software research and development and we're going to be doing
some investigations into message traffic. This is for embedded systems.
What we're looking at right now is XML encoded messages and want to
look into binary or compressed XML, using network services (probably
IP). As such, we'd like to find some libraries that would aid our
investigation.

I'm busily doing web and newsgroup searches, including some of the
WBXML stuff, but thought I'd throw it out here in case someone had
suggestions on some candidates.

Features we'd like:

1. Open Source. We aren't creating any deliverables, so we aren't too
worried about license restrictions at this point. We're purely
interested in sending messages and gathering data.

2. We're working with Linux-based components, so something compatible
with that.

3. A Python API. Actually the tech lead is interested in that, I'm more
interested in a C API, but I'll try to keep him happy. Something
usuable from either would be ideal. Other scripting APIs would be
considered as well.


If anyone has libraries they've worked with, worked on, or reviewed
that look like they might be candidates, I'd be grateful for
suggestions. If this isn't the best newsgroup (seemed the most
promising general-interest group) redirection to a more appropriate
place would also be welcome.



Brian
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Default said:
I work in software research and development and we're going to be doing
some investigations into message traffic. This is for embedded systems.
What we're looking at right now is XML encoded messages and want to
look into binary or compressed XML, using network services (probably
IP). As such, we'd like to find some libraries that would aid our
investigation.

I have read your posting twice and still havent
understood what you want to do. Do you want to
send XML data across a network ?

You mention embedded systems. Such systems are
usually constrained in available memory. For
such systems with limited resources, I can
recommend an extension of the AWK scripting
language with focus XML processing:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file
 
B

Bjoern Hoehrmann

* Default User wrote in comp.text.xml:
^^^^^^^^^^^^
If anyone has libraries they've worked with, worked on, or reviewed
that look like they might be candidates, I'd be grateful for
suggestions. If this isn't the best newsgroup (seemed the most
promising general-interest group) redirection to a more appropriate
place would also be welcome.

The XML-Dev mailing list and http://www.w3.org/XML/EXI/ fora might
be a better place.
 
D

Default User

Jürgen Kahrs said:
I have read your posting twice and still havent
understood what you want to do. Do you want to
send XML data across a network ?

You mention embedded systems. Such systems are
usually constrained in available memory.

This would be the sorts of processors typically used in some embedded
systems, such as flight control, connected via ethernet, and passing
data via network services.

Our hardware tends not to be as restricted in some aspects as other
embedded applications. We're not doing microwave ovens :)

The embedded part is not the most important part, think Linux boxes
connected in LAN configurations. The speed of message transmission and
the ability of test applications to receive said messages is what we
are investigating. We would not want to trade off speed for size or
memory usage.
For
such systems with limited resources, I can
recommend an extension of the AWK scripting
language with focus XML processing:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-
outline-of-an-XML-file

I will take a look, thanks.




Brian
 
J

Joe Kesselman

Default said:
look into binary or compressed XML

This has been proposed many times, and the debate is still in progress.
The W3C has a Working Group in progress which is continuing to debate
whether there is in fact any possiblity of reaching a consensus on what
"binary XML" actually means, or whether in fact it's a catch-all term
for a bunch of solutions which are best considered local/custom/internal
representations rather than something to be standardized.

See http://www.w3.org/XML/Binary/

Historically, after folks have investigated it, they've generally
reached the conclusion that simply running ordinary text-based XML
through a compressor such as the ZIP algorithm yields more compact
messages than an attempt at a binary-XML representation would (since it
compresses on larger scales, and compresses text content) -- and unlike
the binary XML proposals, it retains all the advantages of being easily
human-accessible/debuggable which are part of what makes XML attractive.
(And decompressing zip and the like is surprisingly cheap; compression
is the hard part.)

There are certainly uses for non-text representations of the XML infoset
-- but I don't think I've ever seen a good justification for using one
of these outside a specific application. In fact, trying to make it
general-purpose tends to fight against the compression you're hoping to
achieve.

Interesting question. I may yet be proven wrong. But I'm betting that
the characterization working group comes back to the same old
conclusion: standardized binary XML is an oxymoron.
 
B

Boris Kolpackov

Hi,

I am working on an open-source XML data binding implementation for
C++[1]. Some of our users expressed interest[2] in sending data over
the network in compact, binary form. As a result, I am working on
binary serialization/deserialization of the in-memory representation.
I already have serialization implemented and on a sample document
of about 1K we get about 400 bytes binary representation using
CDR streams (data representation used in CORBA). For comparison,
zip-compressing the same XML file results in about 600 bytes.

Also note that this is not a binary XML in a sense that no markup
information is stored (e.g., element/attribute names, etc.) but
rather a pure data that receiver is assumed to know the format
of. This saves additional space.


Default User said:
Features we'd like:

1. Open Source. We aren't creating any deliverables, so we aren't too
worried about license restrictions at this point. We're purely
interested in sending messages and gathering data.
Check.


2. We're working with Linux-based components, so something compatible
with that.
Check.


3. A Python API. Actually the tech lead is interested in that, I'm more
interested in a C API, but I'll try to keep him happy. Something
usuable from either would be ideal. Other scripting APIs would be
considered as well.

It provides neither at the moment but I would imagine it shouldn't
be to difficult to interface with the code from either C or Python.


[1] http://codesynthesis.com/products/xsd/

[2] http://codesynthesis.com/pipermail/xsd-users/2006-May/000338.html
http://codesynthesis.com/pipermail/xsd-users/2006-May/000343.html

hth,
-boris
 
J

Joe Kesselman

Actually, that group is no longer in existance. The new group that was
formed to take on this task is the 'Efficient XML Interchange Working
Group' (EXI for short).

Thanks; I'd missed that switch-over.

It will be interesting to see whether they can reach a consensus... and,
then, whether the rest of the XML community agrees to accept that
consensus.
 
D

Default User

D

Default User

Actually, that group is no longer in existance. The new group that
was formed to take on this task is the 'Efficient XML Interchange
Working Group' (EXI for short). The URL for it is as follows:

http://www.w3.org/XML/EXI/


I was looking at some stuff on the w3 site, but I'm not sure I saw
that. Thanks for the link, I'll review it.




Brian
 
D

Default User

Joe said:
This has been proposed many times, and the debate is still in
progress. The W3C has a Working Group in progress which is
continuing to debate whether there is in fact any possiblity of
reaching a consensus on what "binary XML" actually means, or whether
in fact it's a catch-all term for a bunch of solutions which are best
considered local/custom/internal representations rather than
something to be standardized.

At this time, we're still in the investigation stage of, "can
XML-encoded messages give the kind of throughput needed for our
purposes?"

We aren't looking to create anything new in that arena, just take a
look at the state of the art, I guess you could say.
Historically, after folks have investigated it, they've generally
reached the conclusion that simply running ordinary text-based XML
through a compressor such as the ZIP algorithm yields more compact
messages than an attempt at a binary-XML representation would (since
it compresses on larger scales, and compresses text content) -- and
unlike the binary XML proposals, it retains all the advantages of
being easily human-accessible/debuggable which are part of what makes
XML attractive. (And decompressing zip and the like is surprisingly
cheap; compression is the hard part.)

That's one of the things I've run across, and I feel should be
addressed. Anybody know of some open-source libraries that have
incorporated that compression?


Thanks for the information. I'm building up my understanding of the
subject. When you work R&D, you get new things to do just about every
year. I have little prior exposure to XML, but this is an opportunity
for me to learn as well as aid the investigation.



Brian
 
J

Joe Kesselman

Default said:
That's one of the things I've run across, and I feel should be
addressed. Anybody know of some open-source libraries that have
incorporated that compression?

Java's standard libraries come with support for accessing zipfiles.

InfoZip's source code is available and (can be incorporated in other
products under a BSD-like freeware license. Their website
(http://www.info-zip.org) also points to some related resources
 
D

Default User

You might consider using Berkeley DB XML. Its open source and has a
Python API. It doesn't support binary XML encoding, but you certainly
could add that without too much work.


I'll take a look at it, thanks.



Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top