Time to standardize the XML library for C/C++

K

Kong Bhat

With XML becoming the de facto data description standard, I am
extremely surprised that there is no movement towards standardizing an
xml library API for use with C and C++. Personally I have been
working with libxml2 (www.gnome.org) for a while now, and I am quite
comfortable with it. I believe that libxml2 is a good start, but I
think a slimmer version of libxml2 should be standardized.


Any thoughts?

Regards,
Kong Posh
 
C

Claudio Puviani

[cross-posts removed]

Kong Bhat said:
With XML becoming the de facto data description
standard, I am extremely surprised that there is no
movement towards standardizing an xml library API
for use with C and C++.

Firstly, this is the wrong forum to propose changes to the standard. You
want comp.std.c++ for that. This newsgroup deals with C++ as it's specified,
not as random individuals would see it changed.
Personally I have been working with libxml2
(www.gnome.org) for a while now, and I am quite
comfortable with it.

So what's your problem? If it does what you need, just keep using it.
I believe that libxml2 is a good start, but I think a
slimmer version of libxml2 should be standardized.

There's no need for it. C++ has no ties to XML and doesn't prevent you from
using a library of your choice. You don't arbitrarily add libraries to a
language standard on the flimsy basis that a lot of people use a particular
feature. TCP/IP is far more prevalent than XML, yet it would be absurd to
add sockets to the standard C++ library. Database access is even more
prevalent. Would you have some ODBC-like library also be added to the
standard? The C++ standard committee has enough on their hands without
tracking changes to unrelated standards. Let whoever is responsible for the
XML standard provide standard bindings for other languages if they have free
time on their hands.

Claudio Puviani
 
E

E. Robert Tisdale

Kong said:
With XML becoming the de facto data description standard,
I am extremely surprised that there is no movement
toward standardizing an xml library API for use with C and C++.
Personally, I have been working with libxml2 (www.gnome.org)
for a while now and I am quite comfortable with it.
I believe that libxml2 is a good start
but I think a slimmer version of libxml2 should be standardized.

How is this on-topic in comp.std.c, comp.lang.c or comp.lang.c++?
Do you want to make this library part of the standard library?
If so, is there a compelling reason why this library must be implemented
by the compiler developer and not a third party vendor?
If the library can be implemented by third party vendors, then
a standard separate from the C/C++ standards may be a better option.
 
R

Richard Tobin

Kong Bhat said:
With XML becoming the de facto data description standard, I am
extremely surprised that there is no movement towards standardizing an
xml library API for use with C and C++.

It would not be appropriate to make this part of the C standard.
There are a million things that should be standardized first:
we don't even have lists or hash tables!

-- Richard
 
S

Steven T. Hatton

Claudio said:
[cross-posts removed]

Kong Bhat said:
With XML becoming the de facto data description
standard, I am extremely surprised that there is no
movement towards standardizing an xml library API
for use with C and C++.

Firstly, this is the wrong forum to propose changes to the standard.

This really sounds more like a w3c issue. http://www.w3.org/DOM/ . I will
suggest there are two possible areas where a standard C++ API would be
worth pursuing. DOM, and SAX. I don't use SAX directly, so I have little
to say about it. As regards the DOM, there is an abstract IDL binding which
may, for all intents and purposes already define a C++ binding. I'm not an
expert in IDL, but I'm pretty sure it originated in the C++ world.

http://www.w3.org/TR/2002/WD-DOM-Level-3-Core-20020409/idl-definitions.html

Apache has this proposal out:

http://xml.apache.org/xerces-c/ApacheDOMC++BindingL3.html

There's no need for it. C++ has no ties to XML and doesn't prevent you
from using a library of your choice. You don't arbitrarily add libraries
to a language standard on the flimsy basis that a lot of people use a
particular feature. TCP/IP is far more prevalent than XML, yet it would be
absurd to add sockets to the standard C++ library. Database access is even
more prevalent. Would you have some ODBC-like library also be added to the
standard? The C++ standard committee has enough on their hands without
tracking changes to unrelated standards. Let whoever is responsible for
the XML standard provide standard bindings for other languages if they
have free time on their hands.

I believe the goal is worthwhile. It is simply not a core language issue.
It's a w3c issue.
 
S

Steven T. Hatton

Steven said:
Claudio said:
[cross-posts removed]

Kong Bhat said:
With XML becoming the de facto data description
standard, I am extremely surprised that there is no
movement towards standardizing an xml library API
for use with C and C++.

Firstly, this is the wrong forum to propose changes to the standard.

This really sounds more like a w3c issue. http://www.w3.org/DOM/ . I will
suggest there are two possible areas where a standard C++ API would be
worth pursuing. DOM, and SAX. I don't use SAX directly, so I have little
to say about it. As regards the DOM, there is an abstract IDL binding
which
may, for all intents and purposes already define a C++ binding. I'm not
an expert in IDL, but I'm pretty sure it originated in the C++ world.

I'm not sure what this really means, but I just found the following in the
CORBA specification, v3.0.3:

http://www.omg.org/cgi-bin/doc?formal/04-03-12

"OMG IDL is preprocessed according to the specification of the preprocessor
in International Organization for Standardization. 1998. ISO/IEC 14882
Standard for the C++ Programming Language. Geneva: International
Organization for Standardization. The preprocessor may be implemented as a
separate process or built into the IDL compiler."
 
D

Dietmar Kuehl

Steven T. Hatton said:
As regards the DOM, there is an abstract IDL binding which
may, for all intents and purposes already define a C++ binding. I'm not an
expert in IDL, but I'm pretty sure it originated in the C++ world.

No, it did not: it originated in the C world. One stated goal of the C++
binding for CORBA was some form of compatibility to C. Although I accept
that the motives were well-intended, the resulting C++ binding is a pain
in the ass (using a much weaker term than I would have used in my native
language). This view can, of course, be attributed to ignorance about the
finer points of the C++ binding on my side.

That said, any realization of W3C's DOM in C++ using any other approach
than a liberal interpretation of their model will be useless already.
Bolting the CORBA C++ on top of this will definitely not improve the
situation - unless, of course, your goal is the creation of the slowest
and hardest to use [correctly] XML processor so far.
 
S

Steven T. Hatton

Dietmar said:
No, it did not: it originated in the C world. One stated goal of the C++
binding for CORBA was some form of compatibility to C. Although I accept
that the motives were well-intended, the resulting C++ binding is a pain
in the ass (using a much weaker term than I would have used in my native
language). This view can, of course, be attributed to ignorance about the
finer points of the C++ binding on my side.

I think this means you don't like the way CORBA (as opposed to the DOM) is
bound to/in C++. Is that correct? If that is what you are intending, I
would like to know more. From my limited understanding of these issues,
this is an area where J2EE has really taken the market share from C++. At
least this is how it looked to me from pretty close to the frontlines.
People were talking CORBA/IIOP and IDL in 1996, and by 2000 the buzz was
all Java Servlets.
 
K

Kong Bhat

It would not be appropriate to make this part of the C standard.
There are a million things that should be standardized first:
we don't even have lists or hash tables!

-- Richard

Kindly note that I am only in favor of standardizing the API. There
could be multiple implementations that conform to that API (including
Richard's very own "newRXP" parser), in much the same way that we have
standard APIs for I/O handling, string manipulation, mathematical
functions etc. The big advantage of that would be that code written
to handle XML processing would become extremely portable.

I will put my money on the prediction that XML processing within C/C++
applications will take off in a very very big way in the next few
years, especially as the use of WebServices becomes more widespread.
That is why I strongly feel that the time is ripe to move towards
standardizing this API.

Regards,
Kong Posh
 
J

James Kuyper

Kong said:
With XML becoming the de facto data description standard, I am
extremely surprised that there is no movement towards standardizing an
xml library API for use with C and C++. Personally I have been
working with libxml2 (www.gnome.org) for a while now, and I am quite
comfortable with it. I believe that libxml2 is a good start, but I
think a slimmer version of libxml2 should be standardized.

Any thoughts?

That's an excellent idea - whoever is responsible for XML should
establish a standard library for generating/parsing it. Such a library
would, of course, be too specialized to have any proper place in the
C/C++ standard libraries.
 
D

Dietmar Kuehl

Steven said:
I think this means you don't like the way CORBA (as opposed to the DOM) is
bound to/in C++.

Well, I don't like the CORBA binding to C++. As it is, I also don't like
the DOM binding to C++. A combination of both would be entirely horrible.
Neither of the two things was done by people knowing C++, at least not
the C++ it is since something like at least ten year.
If that is what you are intending, I would like to know more.

What do you want to know more about? Why the CORBA/C++ binding sucks?
Well, that is pretty simple: it is extremely error prone to use.
Actually, I'm pretty sure that there were even cases where it is
even impossible to use the C++ without creating memory leaks but I
don't remember the details (it is something like four years since I
really used CORBA).

At any rate, RAII is not used which makes resource handling very hard.
You have to keep track very closely on how the parameters are declared
and the details on how to use them in a function differ widely between
in, out, and inout parameters, each having its own rules. Sure, there
are tables telling you how to do it (which cover 90% of the real cases)
but if you change a declaration from eg. out to inout you have to
apply major changes. Of course, the code would compile unchanged - it
would just misbehave (I think it would create a resource leak).

Bindings to other languages were *much* easier to use. I used CORBA
with Java and this is very easy to use. Even easier is the Python
binding (although I didn't use it much; I just used it during
evaluation for a project which was not really started): there the
CORBA stuff is rather transparent. You just use the objects painlessly.

I'm pretty sure that a much easier to use C++ binding would be
possible, probably exhibiting the same performance characteristics
as the current one. There seems to be not much interest in CORBA,
however, and I have no real stake in CORBA to go forward anymore.
From my limited understanding of these issues,
this is an area where J2EE has really taken the market share from C++.

CORBA has several problems which are independent from the C++ binding.
Eg. the protocol is not the most efficient one I have seen. On the
other hand, people also use SOAP which shares a similar problem. Also,
people were abusing CORBA grossly. Actually, I participated in a
project where people though it would be a brilliant idea to have
distributed getter and setter functions: each attribute access was a
remote call. It is hard to come up with even worse performance short
of calling sleep rather frequently. I'm sure this was not the only
project doing something stupid like this.

When J2EE appeared, CORBA had already earned a reputation for being
slow and the media didn't consider any non-Java solution at all. This
helped J2EE big time, especially as C++ had and actually has not good
alternative: the CORBA binding still stucks. On the other hand, there
are few applications which really need much distribution beyond what
a web server does, at least as far as I can tell. People moved away
from fat clients to browser based applications, normally due to the
much reduced maintainance cost at the client side: it is hard to argue
why money needs to be spent on something which can be obtained for
free.
At
least this is how it looked to me from pretty close to the frontlines.
People were talking CORBA/IIOP and IDL in 1996, and by 2000 the buzz was
all Java Servlets.

It's quite a while since Java Servlets were the buzz, either. At least,
I haven't heard much about them lately. On the other hand, I'm not much
listening for buzzes... What would be the buzz now? I think people
returned to good old three tire applications: a thin client, ie. a web
browser, accesses some server which in turn accesses a relational
database. That is, the only difference which effectively remains is
that we don't use a text terminal and that a standardized protocol
(HTTP) is used between client and server.
 
S

Steven T. Hatton

Dietmar said:
Well, I don't like the CORBA binding to C++. As it is, I also don't like
the DOM binding to C++.

I'm not sure there is a formal DOM binding for C++, unless one is to assume
a C++ DOM binding is the de facto binding specified by the IDL Definitions:

http://www.w3.org/TR/2004/REC-DOM-Level-3-Core-20040407/idl-definitions.html
A combination of both would be entirely horrible.
Neither of the two things was done by people knowing C++, at least not
the C++ it is since something like at least ten year.

The CORBA standard has (ostensibly) been maintained. Is it /possible/ some
of the faults you encountered have subsequently been addressed?
What do you want to know more about? Why the CORBA/C++ binding sucks?

Yes. That, and in particular what, if anything, is wrong with IDL.
Well, that is pretty simple: it is extremely error prone to use.
Actually, I'm pretty sure that there were even cases where it is
even impossible to use the C++ without creating memory leaks but I
don't remember the details (it is something like four years since I
really used CORBA).

I'm aware of some very powerful tools created with CORBA related middle
ware.
At any rate, RAII is not used which makes resource handling very hard.
You have to keep track very closely on how the parameters are declared
and the details on how to use them in a function differ widely between
in, out, and inout parameters, each having its own rules. Sure, there
are tables telling you how to do it (which cover 90% of the real cases)
but if you change a declaration from eg. out to inout you have to
apply major changes. Of course, the code would compile unchanged - it
would just misbehave (I think it would create a resource leak).

I'm inclined to suggest these problems are a result of expecting too much
from the tool. But I also suspect, the tool/specification promises too
much.
Bindings to other languages were *much* easier to use. I used CORBA
with Java and this is very easy to use.

That doesn't surprise me. I wonder if there are limitations on the Java
bindings which prevent certain functionality C++ offers. I have a strong
sense that C++ could be as easy to use as any of the other languages simply
by a philosophical realignment. There are analogies that keep running
through my head regarding how C++ compares to Java. One of these is the
notion of building blocks.

Imagine two factories produce building blocks. Factory J produces a very
limited variety of shapes. But these shapes are well suited to almost all
of the tasks builders assume. Factory C++ also produces these same kinds
of block, it it also produces blocks in several different shapes, and will
even custom produce blocks according to your specification. As it turns
out, the contractors who buy only factory J blocks tend to get most jobs
done faster, and with fewer defects than the contractors who use C++
blocks. The irony is, there is a subset of the C++ factory's blocks
virtually identical to factory J's. And they are actually /better/ in many
ways.

So why is there a problem? The C++ builders fail to limit themselves to that
regular subset, even when they should. But the really good C++ builders
know how to do it right, so why should /they/ care? Factory J is getting
the mainstream of revenu. Factory C++ may arguably provide a superior set
of building blocks, but they are harder to do business with (all their
procedures are still paper-based) and people don't know how to use their
products effectively.
I'm pretty sure that a much easier to use C++ binding would be
possible, probably exhibiting the same performance characteristics
as the current one. There seems to be not much interest in CORBA,
however, and I have no real stake in CORBA to go forward anymore.

Well, before Albert Einstein drafted Marcel Grossmann to do some
mathematical research for him, there was not a great deal of general
interest in tensor calculus. Who knows? I suspect part of the problem
with CORBA is spelled COM, (pronounced Microsoft).
CORBA has several problems which are independent from the C++ binding.
Eg. the protocol is not the most efficient one I have seen. On the
other hand, people also use SOAP which shares a similar problem.

SOAP isn't even advertised as efficient. But SOAP is pronounced the same
way COM is.
Also,
people were abusing CORBA grossly. Actually, I participated in a
project where people though it would be a brilliant idea to have
distributed getter and setter functions: each attribute access was a
remote call. It is hard to come up with even worse performance short
of calling sleep rather frequently. I'm sure this was not the only
project doing something stupid like this.

Those are the kinds of things you do in a lab to convince yourself they
really are stupid. ;-)
It's quite a while since Java Servlets were the buzz, either.

They are simply folded into Apache's Tomcat. Servlets are a core feature of
the J2EE and JWSDP. It's kind of like saying you don't hear a lot about
TCP/IP these days.
On the other hand, I'm not much listening for buzzes... What would be the
buzz now? I think people

The projects I'm thinking of are still going in the same direction. J2EE and
JWSDP. Much of the core technology is programming language agnostic. Its
just that the Java community latched onto ideas such as XML while a lot of
C++ developers scoffed at the whole field.
returned to good old three tire applications: a thin client, ie. a web
browser, accesses some server which in turn accesses a relational
database.

The projects I'm talking about accessed dozens of disparate databases to
answer customized queries generated at will by the user.
That is, the only difference which effectively remains is
that we don't use a text terminal and that a standardized protocol
(HTTP) is used between client and server.

But that is a transport protocol. Ironically what SOAP does is ride on the
back of HTTP to achieve what CORBA was intended to do natively.

But the part I'm really interested in right now is IDL. Especially as it
relates to C++ interface definitions.

This is something that /seems/ to have worked reasonably will XPIDL:

http://www.mozilla.org/scriptable/xpidl/
 
S

Steven T. Hatton

Dietmar said:
CORBA has several problems which are independent from the C++ binding.

I don't know if this is off topic or not. But it's marked OT, so I guess I'm
safe. Especially because it does involve C++.

Do you know how to translate the w3c DOM IDL Definitions into C++ header
files? I'm sure RTFM is an option. But I am somewhat concerned I could
read for a month and never find the answere that takes 5 minutes to learn.

BTW, Are you the guy who "maticulously reviewed and edited the whole [C++
Templates] book"?
 
S

Steven T. Hatton

Dietmar said:
I haven't claimed that it isn't doable (I did so myself) but it is
definitely unnecessarily hard: you need people which have read quite a
lot of stuff and even then mistakes are made easily and only revealed
by thorough code reviews, at least when using C++ with CORBA.

I don't know how much I'm going to be putting into it myself. I was very
quickley getting up to speed with Java and all the web services stuff when
the Trolls put out the new Qt Book. I'd been wanting to learn C++ for
several years. I'm particularly interested in KDE related stuff. I just
dropped everything Java and put all my time into C++. It's _much_ harder
than I expected it to be.
Nope. This is not the problem with OMG IDL/C++ binding. The problem with
this binding is partially due to the entirely stupid approach used for
resource maintainance which effectively makes any automated approach
impossible:

Something tells me you weren't using Rational Rose, or a similar product. Is
that correct?

I tend to believe the IDL model for defining interfaces is potentially very
useful to C++. The IDL interfaces are almost identical to C++ headers.
I'm even considering using sed rather than an IDL tool to translate them
into C++.
There is lots of necessary manual interaction when using the
OMG IDL/C++ binding. And worse, the details differ depending on whether
the argument is passed a in, out, or inout argument. As stated before:
give it a try! It is plain horrible.

This sounds like the traditional marschaling/unmarschalling argument.
The fact
that nothing happened afterwards is mainly due to the realization that
there is no big market interested in CORBA

IIRC, they used to charge money just to look at the documentation. That was
probably a bad move when put up against the openness of Java.
and that the person who
was put into charge for doing the work left the committee shortly
afterwards due to entirely unrelated reasons.

That's a shame. But it's also a significant statement. There may be parts
of CORBA which are worth resurecting on their own. I also wonder what if
any C++ technology could take the place of CORBA based application servers.
 
S

Steven T. Hatton

Dietmar said:
Actually, I don't think that W3C specifies language bindings for their
IDL stuff. I'm also not sure whether they refer to OMG's IDL and its
binding although I seem to recall that they state that this binding is
an option but not required.
I'm pretty sure they are testing conformance against their Java and
ECMAScript (JavaScript) binding specifications. These are almost identical
to the IDL abstract binding.

http://www.w3.org/DOM/Test/
Personally, I would use a rather different
interpretation of the DOM standard than the implementations eg. in
Xerces. It is a while since I looked more closely into this stuff.

This is a list of 3rd party recommendations for DOM bindings. Apache's
Xerces is the only one addressing all three levels. I believe it's a
question of how closely the implementations come to resembling the IDL.
The ECMAScript and Java bindings use slightly different naming conventions,
but they are mappable to the IDL.

http://www.w3.org/DOM/Bindings
I haven't
seen another person by that name in the C++ community.

That must mean you are also this person:

http://www.josuttis.com/libbook/index.html

"First, I d like to thank Dietmar K¨uhl. Dietmar is an expert on C++,
especially on input/output streams and internationalization (he implemented
an I/O stream library just for fun). He not only translated major parts of
this book from German to English, he also wrote sections of this book using
his expertise. In addition, he provided me with invaluable feedback over
the years."

I'm impressed. :)
 
D

Dietmar Kuehl

Steven said:
Do you know how to translate the w3c DOM IDL Definitions into C++ header
files?

Actually, I don't think that W3C specifies language bindings for their
IDL stuff. I'm also not sure whether they refer to OMG's IDL and its
binding although I seem to recall that they state that this binding is
an option but not required. Personally, I would use a rather different
interpretation of the DOM standard than the implementations eg. in
Xerces. It is a while since I looked more closely into this stuff.
BTW, Are you the guy who "maticulously reviewed and edited the whole [C++
Templates] book"?

Currently, I don't have the book with me to check the quote but if
they attached my name to this statement, I was surely this person (and
I haven't read this statement since I have reviewed the book but not
read the final result :) Actually, although there are several other
persons with the same name reveal by German phone books, I haven't
seen another person by that name in the C++ community.
 
D

Dietmar Kuehl

Steven said:
I'm aware of some very powerful tools created with CORBA related middle
ware.

I haven't claimed that it isn't doable (I did so myself) but it is
definitely unnecessarily hard: you need people which have read quite a
lot of stuff and even then mistakes are made easily and only revealed
by thorough code reviews, at least when using C++ with CORBA.
I'm inclined to suggest these problems are a result of expecting too much
from the tool. But I also suspect, the tool/specification promises too
much.

Nope. Have you tried to use the CORBA C++ binding? I have. I also have
used CORBA with other languages (Java and Python) and it is *much* easier
to use CORBA with these languages.
That doesn't surprise me. I wonder if there are limitations on the Java
bindings which prevent certain functionality C++ offers.

None I'm aware of. In fact, I think there are certain cornercase things
you can do in Java but not in C++ (but I don't remember the example; if
I remember correctly it had to do with certain value types and reference
counting but I'm not at all sure; however, I seem to remember that some
things couldn't be done [correctly] in C++).
I have a strong
sense that C++ could be as easy to use as any of the other languages
simply by a philosophical realignment.

It would certainly be doable to create such a language binding but the
OMG IDL/C++ binding makes things really hard when used from C++: Give
it a try but be prepared that it sucks rocks.
There are analogies that keep running
through my head regarding how C++ compares to Java. One of these is the
notion of building blocks.

The apply in general but not to the OMG IDL/C++ binding.
Imagine two factories produce building blocks. Factory J produces a very
limited variety of shapes. But these shapes are well suited to almost all
of the tasks builders assume. Factory C++ also produces these same kinds
of block, it it also produces blocks in several different shapes, and will
even custom produce blocks according to your specification. As it turns
out, the contractors who buy only factory J blocks tend to get most jobs
done faster, and with fewer defects than the contractors who use C++
blocks. The irony is, there is a subset of the C++ factory's blocks
virtually identical to factory J's. And they are actually /better/ in
many ways.

I agree that this view generally holds when comparing Java and C++.
It does not hold with the OMG IDL/C++ binding. The C++ binding is
unnecessary complex and limiting. Other language bindings don't share
this problem. Why the C++ binding is so bad I don't reallly know. My
personal impression is that the C++ binding was created by C programmers
(that is, people having the C mindset, even if they are using a C++
compiler since ages).
So why is there a problem? The C++ builders fail to limit themselves to
that regular subset, even when they should.

Nope. This is not the problem with OMG IDL/C++ binding. The problem with
this binding is partially due to the entirely stupid approach used for
resource maintainance which effectively makes any automated approach
impossible: There is lots of necessary manual interaction when using the
OMG IDL/C++ binding. And worse, the details differ depending on whether
the argument is passed a in, out, or inout argument. As stated before:
give it a try! It is plain horrible.
But the really good C++ builders
know how to do it right, so why should /they/ care?

The good C++ programmers tend to know what they are doing. This does
not really help if they don't absorb the details of this stupid binding.
If they absorb these details, the good people are able to adhere to
stupid programming guidelines causing them to write horrendous amounts
of stupid code - and still be able to maintain this. Everybody who was
forced to use this stuff for more than trivial use *CARES*! The was an
unofficial session on a CORBA binding at the Dublin C++ committee
meeting. Roughly 90% of the people on the committee showed up at this
session and agreed that there is need for a better binding. The fact
that nothing happened afterwards is mainly due to the realization that
there is no big market interested in CORBA and that the person who
was put into charge for doing the work left the committee shortly
afterwards due to entirely unrelated reasons.
Factory J is getting
the mainstream of revenu. Factory C++ may arguably provide a superior set
of building blocks, but they are harder to do business with (all their
procedures are still paper-based) and people don't know how to use their
products effectively.

In the case of the OMG IDL binding the C++ binding is inferior! It is
due to stupid decisions made by the people who created this binding. It
is not inherent to C++ at all. As stated above: give it a try! Try to
use the C++ binding. ... and then try to use the Python or Java binding
for comparison! For the latter two, there is nearly nothing to be done
specifically in the CORBA functions. This very different from the
requirements on functions using the C++ binding!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,239
Members
46,827
Latest member
DMUK_Beginner

Latest Threads

Top