parsing XML with 'expat'

B

Bjoern Hoehrmann

* Roman Mashak wrote in comp.text.xml:
I hope this might be the right group to ask. I need to parse out in C
language the XML of the following structure:

<BERTEST>
<NODE1>
<FREQ>666000000</FREQ>
<POWER>-82</POWER>
</NODE1>
<NODE1>
<FREQ>484000000</FREQ>
<POWER>-80</POWER>
</NODE2>
</BERTEST>

So I took the 'expat' library to do that (I've never dealt with XML before
though), and tried to cutomize the example they ship with library
(outline.c). What I can't quite understand is:
1) is my XML really can be called XML, or it's some sort of invalid?
According to wikipedia page on XML, the valid document should look like
this:

<name attribute="value">content</name>

while mine is a bit different

Your second <NODE1> should probably be <NODE2> (otherwise the start- and
end-tags do not match up), but other than that it certainly is XML. You
are free to choose (when designing a new XML format) whether you use an
attribute or element to encode some information.
2) if anyway my xml document is correct, then how can I parse it with expat?
What I need is upon occurences of FREQ and POWER tags to extract their
values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

So, I think I need to register callback function for start tags and try to
do what I want in there. But how can I get the values of tags, which 'expat'
functions to use? Or there's another, more simple way?

Expart reports the text through the `characters` callback. You have to
setup a handler for it and accumulate the text reported to it; then
process the text e.g. in the end_element handler. There is no direct way
get to the text when using Expat.
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

Roman said:
2) if anyway my xml document is correct, then how can I parse it with expat?
What I need is upon occurences of FREQ and POWER tags to extract their
values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

What do you do with the extracted text ?
Do you put it into a text file for further
processing ? Then you can use any scripting
languag that is easier to use than the Expat
at C level.

For example, you can try this one:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/xmlgawk.html#Printing-an-outline-of-an-XML-file

@load xml
XMLSTARTELEM {
printf("%*s%s", 2*XMLDEPTH-2, "", XMLSTARTELEM)
for (i=1; i<=NF; i++)
printf(" %s='%s'", $i, XMLATTR[$i])
print ""
}

This script does exactly what the outline.c example
from Expat does.
 
R

Roman Mashak

Hello,

I hope this might be the right group to ask. I need to parse out in C
language the XML of the following structure:

<BERTEST>
<NODE1>
<FREQ>666000000</FREQ>
<POWER>-82</POWER>
</NODE1>
<NODE1>
<FREQ>484000000</FREQ>
<POWER>-80</POWER>
</NODE2>
</BERTEST>

So I took the 'expat' library to do that (I've never dealt with XML before
though), and tried to cutomize the example they ship with library
(outline.c). What I can't quite understand is:
1) is my XML really can be called XML, or it's some sort of invalid?
According to wikipedia page on XML, the valid document should look like
this:

<name attribute="value">content</name>

while mine is a bit different

2) if anyway my xml document is correct, then how can I parse it with expat?
What I need is upon occurences of FREQ and POWER tags to extract their
values (i.e. 666000000 for FREQ or 082 for POWER in the above example).

So, I think I need to register callback function for start tags and try to
do what I want in there. But how can I get the values of tags, which 'expat'
functions to use? Or there's another, more simple way?

Thanks in advance
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,190
Members
46,736
Latest member
zacharyharris

Latest Threads

Top