Expat problems

  • Thread starter Jakob Møbjerg Nielsen
  • Start date
J

Jakob Møbjerg Nielsen

Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)



-----Source code-----
#include <stdio.h>
#include <expat.h>

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData);
}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);

buffer = (char *)malloc(fsize+1);

if (buffer == NULL)
exit(2);

fread(buffer, 1, fsize, fp);

buffer[fsize] = '\0';

printf("%s\n", buffer);

fclose(fp);

parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer);
if (!XML_Parse(parser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done);

XML_ParserFree(parser);

}

return 0;
}
-------------------

-----XML input-----
<?xml version="1.0" ?>
<a>
</a>
 
T

Thomas Matthews

Jakob said:
Expat keeps telling me that there is "junk after document element". I've tried different encoding, and I'm quite sure that the
buffer is nul-terminated. I really have no idea to what the problem might be. Any ideas?

X-POST: comp.lang.c, comp.text.xml (I don't know which group is the right one)



-----Source code-----
#include <stdio.h>
#include <expat.h>
Not a standard header. What is in here?

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got element: %S\nwith userData:\n%s\n", name, (char *)userData);
My understanding is that the printf() format specifiers are case
sensitive, although I'm sure somebody here will correct me if I'm
wrong.

}

void endElement(void *userData, const char *name)
{
}

int main(int argc, char *argv[])
{
FILE *fp;
char *buffer;
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
fseek(fp, 0, SEEK_END);
fsize = ftell(fp);
rewind(fp);

There is no guarantee that the ending position of a file is the
same as the size of the file. Character translations and other
stuff may obscure the size. The only method to know the actual
size of the file is to open the file in binary mode and count
all the characters.

buffer = (char *)malloc(fsize+1);
In the times when memory was small and precious, input data
was read in by "chunks" instead of the whole file into memory.
Granted, reading it into memory is the most efficient method,
there is no guarantee that your platform or the platform that
this program will run on will have enough memory for the largest
sized file. Harddisks are becoming larger these days.

I say read in the data in chunks.

if (buffer == NULL)
exit(2);
You might want to be nice to the user and print a reason why
the program is aborting.

fread(buffer, 1, fsize, fp);
See above about reading in chunks.
buffer[fsize] = '\0';

printf("%s\n", buffer);
You are printing the enter file here. Could take a while.
Is this necessary?

fclose(fp);

parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
done = fsize < sizeof(buffer);
The expression "sizeof(buffer)" returns the size of the pointer,
not the buffer. By the way, if you look up a few lines, you
will note that the buffer was allocated with a size of
"fsize + 1". So, what is this statement supposed to do?

if (!XML_Parse(parser, buffer, fsize, 0)) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
} while (!done);
See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?
XML_ParserFree(parser);

}

return 0;
}

I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.


--
Thomas Matthews

C++ newsgroup welcome message:
http://www.slack.net/~shiva/welcome.txt
C++ Faq: http://www.parashift.com/c++-faq-lite
C Faq: http://www.eskimo.com/~scs/c-faq/top.html
alt.comp.lang.learn.c-c++ faq:
http://www.raos.demon.uk/acllc-c++/faq.html
Other sites:
http://www.josuttis.com -- C++ STL Library book
 
J

Jakob Møbjerg Nielsen

Thomas said:
Not a standard header. What is in here?

Expat - the XML parser.
I say read in the data in chunks.

Well, this is just for testing with small XML files (probably not above
1M).
You are printing the enter file here. Could take a while.
Is this necessary?

Debugging :)
I didn't want to start gdb just for looking at the contents of buffer.
See my question about the assignment to "done" above.
Why do you bother processing the data in chunks when
you have read the entire file into memory?

Because, later on, the data will be streamed in from a socket.
I cannot comment on the correctness of the XML_*()
function calls since I don't have that header file
and you haven't supplied those declarations.

There is quite a few:
http://guinness.cs.stevens-tech.edu/packages/expat/reference.html

Anyway, I've tried cleaning up a bit and played around with
feeding the parser in a "stream-like" manner, but I still
get that pesky "junk after document element" message. If I
use UTF-8 I get a "not well-formed (invalid token)".

#include <stdio.h>
#include <expat.h>

void startElement(void *userData, const char *name, const char **atts)
{
printf("Got start-element: %s\n", name);
}

void endElement(void *userData, const char *name)
{
printf("Got end-element: %s\n", name);
}

int main(int argc, char *argv[])
{
FILE *fp;
char buffer[1];
char *prog = argv[0];
long fsize;
XML_Parser parser;
int userData = 0;
int done;

if(argc == 1) return 0;

if ((fp = fopen(*++argv, "r")) == NULL) {
fprintf(stderr, "%s: Can't open %s", prog, *argv);
exit(1);
} else {
parser = XML_ParserCreate((XML_Char *)"ISO-8859-1");
XML_SetUserData(parser, &userData);
XML_SetElementHandler(parser, startElement, endElement);
do {
if (!feof(fp)) {
buffer[0] = fgetc(fp);
if (!XML_Parse(parser, buffer, strlen(buffer), feof(fp))) {
fprintf(stderr,
"%s at line %d\n",
XML_ErrorString(XML_GetErrorCode(parser)),
XML_GetCurrentLineNumber(parser));
return 1;
}
}
} while (!feof(fp));
XML_ParserFree(parser);
}
return 0;
}
 
T

Toni Uusitalo

Examine for example elements.c Expat example file for more carefully,
copy the parsing loop (do loop) from there.

Replace only stdin with your FILE*. You might also want to open file in "rb"
(binary mode) to avoid CRLF translations.

it seems ou're trying something funny with strlen() in your code.

with respect,
Toni Uusitalo
 
P

Patrick TJ McPhee

% Expat keeps telling me that there is "junk after document element".

% if ((fp = fopen(*++argv, "r")) == NULL) {
% fprintf(stderr, "%s: Can't open %s", prog, *argv);
% exit(1);
% } else {
% fseek(fp, 0, SEEK_END);
% fsize = ftell(fp);
% rewind(fp);
%
% buffer = (char *)malloc(fsize+1);
%
% if (buffer == NULL)
% exit(2);
%
% fread(buffer, 1, fsize, fp);

If you're not on a Unix system, ftell() might give you a larger value than
fread() returns. You might want to check the return value of fread().

% printf("%s\n", buffer);

You might want to do a hex dump rather than just printing up to the first
NULL. If there are trailing NULLS after the last >, expat while give you
an error message.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top