DTD to C structure

G

GSA

Hi,
I need to create structures from DTD. But it donot know what sort of
mapping I can do to the following DTD types( whether to structures/
unions/enums).

<!ELEMENT note (#PCDATA|to|from|header|message)*>

<!ELEMENT note (to,from,header,(message|body))>

<!ELEMENT note (message?)>

<!ELEMENT note (message+)>
(taken from http://www.w3schools.com/dtd/dtd_elements.asp)

I donot know what C construct to use for these different notations. I
browsed the net but not much specific info. Please help.

Regards,
GSA
 
L

luserXtrog

Hi,
I need to create structures from DTD. But it donot know what sort of
mapping I can do to the following DTD types( whether to structures/
unions/enums).

<!ELEMENT note (#PCDATA|to|from|header|message)*>

<!ELEMENT note (to,from,header,(message|body))>

<!ELEMENT note (message?)>

<!ELEMENT note (message+)>
(taken fromhttp://www.w3schools.com/dtd/dtd_elements.asp)

I donot know what C construct to use for these different notations.  I
browsed the net but not much specific info. Please help.>

What's the purpose?

hth
looza
 
I

Ian Collins

Hi,
I need to create structures from DTD. But it donot know what sort of
mapping I can do to the following DTD types( whether to structures/
unions/enums).

Mapping a DTD structure to C types isn't trivial. Not only do you have
lists of child elements, but you have compulsory and optional children
as well as single and multiple instances.
<!ELEMENT note (#PCDATA|to|from|header|message)*>

<!ELEMENT note (to,from,header,(message|body))>

<!ELEMENT note (message?)>

<!ELEMENT note (message+)>
(taken from http://www.w3schools.com/dtd/dtd_elements.asp)

I donot know what C construct to use for these different notations. I
browsed the net but not much specific info. Please help.

A struct with lists of various types of children would be a good start.
 
N

Nobody

I need to create structures from DTD.

SGML or XML?

XML is more restrictive, so there are fewer cases to consider. In terms of
the children specification from <!ELEMENT ...> declarations:

(A,B,C) => struct { A a; B b; C c; };
(A|B|C) => struct { int which; union { A a; B b; C c; } value; };
A? => struct { int valid; A a; };
A* => struct { int count; A *a; };
A+ => struct { int count; A *a; }; /* count will be >= 1 */

SGML also has (A&B), which is equivalent to ((A,B)|(B,A)), but it starts
getting messy when you have many alternatives (i.e. you get N-factorial
permutations). If the order doesn't matter, treat it as (A,B,C). If
it matters, either treat it as (A|B|C)*, or as (A,B,C) but with a
separate field for the positions:

(A&B&C) => struct { int position[3]; A a; B b; C c; };

SGML also has A-(B), which is really just "A, except that B may not occur
within A", and may as well be represented as A (the exclusion only affects
parsing, and doesn't need to affect the representation).
 
B

BGB / cr88192

GSA said:
Hi,
I need to create structures from DTD. But it donot know what sort of
mapping I can do to the following DTD types( whether to structures/
unions/enums).

<!ELEMENT note (#PCDATA|to|from|header|message)*>

<!ELEMENT note (to,from,header,(message|body))>

<!ELEMENT note (message?)>

<!ELEMENT note (message+)>
(taken from http://www.w3schools.com/dtd/dtd_elements.asp)

I donot know what C construct to use for these different notations. I
browsed the net but not much specific info. Please help.

use DOM-like nodes for XML.
use DOM-like nodes for DTD's.

just it uses modified parsing/printing logic.
as for evaluating DTD's, this is another matter.


for example (not quite DOM, but from my compiler, which uses XML
internally):
typedef struct BCCX_Node_s BCCX_Node;
typedef struct BCCX_Attr_s BCCX_Attr;

struct BCCX_Node_s {
BCCX_Node *next; //next node in current list (same parent)
BCCX_Node *prev; //prior node in current list (same parent)
BCCX_Node *up; //parent node
BCCX_Node *down; //first child node
BCCX_Node *down_end; //last child node (insert optimization)

BCCX_Attr *attr; //attributes

char *ns; //namespace (optional)
char *tag; //tagname (excludes text for normal nodes)
char *text; //textual data (excludes ns/tag/attr for
normal nodes)

BCCX_Node *hnext; //hash next (lookup optimization, exact use depends on
context)
int type; //node type
};

struct BCCX_Attr_s {
BCCX_Attr *next; //next attribute (same node)
char *ns; //namespace
char *var; //attribute name
char *val; //attribute value
};

comments were expanded here.

side notes:
I usually intern strings so that the '==' operator can be used to check for
string equivalence (faster than strcmp for longer lookups, although a lookup
will require interning the tags/values to be looked-up which is not free,
and to do so cheaply would require keeping track of any commonly-used
strings in variables);
the 'Document' context is usually implicit, and not usually referenced by
its contained nodes;
I don't optimize for integer/real attributes, which is an issue as
integer/real attributes are common, but there is no "good" way to handle
them (apart from adding another field to 'Attr' to hold them, so I mostly
ignore issue);
....

same nodes can be used for both purposes, despite DTD's having a different
syntax from other XML (they can be mapped).

not sure what proper DOM does, not looked into this...


or such...
 
G

GSA

I need to create structures from DTD.

SGML or XML?

XML is more restrictive, so there are fewer cases to consider. In terms of
the children specification from <!ELEMENT ...> declarations:

(A,B,C) => struct { A a; B b; C c; };
(A|B|C) => struct { int which; union { A a; B b; C c; } value; };
A?      => struct { int valid; A a; };
A*      => struct { int count; A *a; };
A+      => struct { int count; A *a; };      /* count will be >= 1 */

SGML also has (A&B), which is equivalent to ((A,B)|(B,A)), but it starts
getting messy when you have many alternatives (i.e. you get N-factorial
permutations). If the order doesn't matter, treat it as (A,B,C). If
it matters, either treat it as (A|B|C)*, or as (A,B,C) but with a
separate field for the positions:

(A&B&C) => struct { int position[3]; A a; B b; C c; };

SGML also has A-(B), which is really just "A, except that B may not occur
within A", and may as well be represented as A (the exclusion only affects
parsing, and doesn't need to affect the representation).

Thanks a million for the precise solution. I however did not
understand the use of count variabele. What if I did not use it?
My input is an XML. I need to do 2 opeartions: XML to structure and
given a structure, create XML. I have the DTD for the XML.
Also please suggest the construct for attributes.

Kind regards,
GSA
 
G

GSA

use DOM-like nodes for XML.
use DOM-like nodes for DTD's.

just it uses modified parsing/printing logic.
as for evaluating DTD's, this is another matter.

for example (not quite DOM, but from my compiler, which uses XML
internally):
typedef struct BCCX_Node_s BCCX_Node;
typedef struct BCCX_Attr_s BCCX_Attr;

struct BCCX_Node_s {
BCCX_Node *next;               //next node in current list (same parent)
BCCX_Node *prev;              //prior node in current list (same parent)
BCCX_Node *up;                 //parent node
BCCX_Node *down;            //first child node
BCCX_Node *down_end;    //last child node (insert optimization)

BCCX_Attr *attr;        //attributes

char *ns;                      //namespace (optional)
char *tag;                    //tagname (excludes text for normal nodes)
char *text;                   //textual data (excludes ns/tag/attr for
normal nodes)

BCCX_Node *hnext;  //hash next (lookup optimization, exact use depends on
context)
int type;                        //node type

};

struct BCCX_Attr_s {
BCCX_Attr *next;    //next attribute (same node)
char *ns;      //namespace
char *var;    //attribute name
char *val;    //attribute value

};

comments were expanded here.

side notes:
I usually intern strings so that the '==' operator can be used to check for
string equivalence (faster than strcmp for longer lookups, although a lookup
will require interning the tags/values to be looked-up which is not free,
and to do so cheaply would require keeping track of any commonly-used
strings in variables);
the 'Document' context is usually implicit, and not usually referenced by
its contained nodes;
I don't optimize for integer/real attributes, which is an issue as
integer/real attributes are common, but there is no "good" way to handle
them (apart from adding another field to 'Attr' to hold them, so I mostly
ignore issue);
...

same nodes can be used for both purposes, despite DTD's having a different
syntax from other XML (they can be mapped).

not sure what proper DOM does, not looked into this...

or such...- Hide quoted text -

- Show quoted text -

I think this is similar to the libXml2 xmlPtr structure. I could not
relate this to my doubt as I am parsing and then putting values onto
structure. Am I missing something here?

Regards,
GSA
 
N

Nobody

Thanks a million for the precise solution. I however did not
understand the use of count variabele. What if I did not use it?

If a node can have any number of children of a given type (i.e. A* or A+
constructs), then you need to store an array of children; the usual way of
storing variable-sized arrays in C is as a count and a pointer to the
array, although there are other solutions (e.g. a linked list).
Also please suggest the construct for attributes.

Each element type will have a set of attributes just like it has children.
Except each attribute only occurs once, and its value isn't an element.

There is a fixed set of "types" for attributes: CDATA, ID, IDREF, etc. How
you store these really depends upon the processing you're likely to be
doing. If you aren't doing any processing, you can just store the raw data
as a char*. If you are doing processing, you probably want to store
multiple-valued attributes (IDREFS, NMTOKENS, ENTITIES) as arrays,
enumerated values as integers (with the values taken from an "enum" type).
An IDREF might be the name or it might be a pointer to the node.

Also, an #IMPLIED attribute either needs an associated flag to indicate
whether it was provided or a distinct sentinel value to indicate
"not provided".
 
B

BGB / cr88192

use DOM-like nodes for XML.
use DOM-like nodes for DTD's.

just it uses modified parsing/printing logic.
as for evaluating DTD's, this is another matter.

for example (not quite DOM, but from my compiler, which uses XML
internally):
typedef struct BCCX_Node_s BCCX_Node;
typedef struct BCCX_Attr_s BCCX_Attr;

struct BCCX_Node_s {
BCCX_Node *next; //next node in current list (same parent)
BCCX_Node *prev; //prior node in current list (same parent)
BCCX_Node *up; //parent node
BCCX_Node *down; //first child node
BCCX_Node *down_end; //last child node (insert optimization)

BCCX_Attr *attr; //attributes

char *ns; //namespace (optional)
char *tag; //tagname (excludes text for normal nodes)
char *text; //textual data (excludes ns/tag/attr for
normal nodes)

BCCX_Node *hnext; //hash next (lookup optimization, exact use depends on
context)
int type; //node type

};

struct BCCX_Attr_s {
BCCX_Attr *next; //next attribute (same node)
char *ns; //namespace
char *var; //attribute name
char *val; //attribute value

};

comments were expanded here.

side notes:
I usually intern strings so that the '==' operator can be used to check
for
string equivalence (faster than strcmp for longer lookups, although a
lookup
will require interning the tags/values to be looked-up which is not free,
and to do so cheaply would require keeping track of any commonly-used
strings in variables);
the 'Document' context is usually implicit, and not usually referenced by
its contained nodes;
I don't optimize for integer/real attributes, which is an issue as
integer/real attributes are common, but there is no "good" way to handle
them (apart from adding another field to 'Attr' to hold them, so I mostly
ignore issue);
...

same nodes can be used for both purposes, despite DTD's having a different
syntax from other XML (they can be mapped).

not sure what proper DOM does, not looked into this...

or such...- Hide quoted text -

- Show quoted text -

<--
I think this is similar to the libXml2 xmlPtr structure. I could not
relate this to my doubt as I am parsing and then putting values onto
structure. Am I missing something here?
-->

typically, if one is doing something DOM-like, then it is common to parse
all of the XML into a structure like the above, and then perform all
operations, usually via API calls, which manipulate the above structures
(getting/setting attributes, searching for child-nodes or inserting
children, ...).


however, it sounds like you might be wanting to map some particular
(pre-existing) XML directly to C structures (where one sees attributes as
struct fields, ... rather than needing to get/set them via calls).

this is a very different technology (data-binding), and is usually handled
partly via things like SAX (these read XML in a stream and send the results
to user-supplied callbacks or similar, where one is then responsible for
sticking the attribute values into struct fields, ...).

however, my personal experience with SAX-style technologies is limited (my
stuff tends to involve manipulating XML trees, which is better done via a
more DOM-style implementation).


or such...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,954
Messages
2,570,116
Members
46,704
Latest member
BernadineF

Latest Threads

Top