M
Milo Thurston
I have some XML looking like the following, other than being very much
larger (some files are up to 2GB):
<?xml version="1.0" encoding="UTF-8"?>
<server_url>http://myserver.edu/data/</server_url>
<server_name>myserver.edu</server_name>
<uploads>
<result>
<dir>/storage/data/results/</dir>
<result_name>hadcm3l_00012_00000118_0</result_name>
<file_info>
<name>hadcm3l_00012_00000118_0_6.zip</name>
<nbytes>5154055</nbytes>
<md5_checksum>485600296bb601ab4a3d1d49a9fb1c86</md5_checksum>
</file_info>
<file_info>
<name>hadcm3l_00012_00000118_0_7.zip</name>
<nbytes>5153055</nbytes>
<md5_checksum>36a600296cb60229a3d1d49a9fb1a10</md5_checksum>
</file_info>
</result>
</uploads>
</xml>
I've tried a few xml parsers such as xml-simple, libxml and quixml, but
all reject this data as badly formed. One answer would, of course, be
for the data to be re-generated using properly formed xml. Meanwhile, is
there anything that could be done with the existing files? Is it a case
of having to write regexps to parse this sort of thing?
larger (some files are up to 2GB):
<?xml version="1.0" encoding="UTF-8"?>
<server_url>http://myserver.edu/data/</server_url>
<server_name>myserver.edu</server_name>
<uploads>
<result>
<dir>/storage/data/results/</dir>
<result_name>hadcm3l_00012_00000118_0</result_name>
<file_info>
<name>hadcm3l_00012_00000118_0_6.zip</name>
<nbytes>5154055</nbytes>
<md5_checksum>485600296bb601ab4a3d1d49a9fb1c86</md5_checksum>
</file_info>
<file_info>
<name>hadcm3l_00012_00000118_0_7.zip</name>
<nbytes>5153055</nbytes>
<md5_checksum>36a600296cb60229a3d1d49a9fb1a10</md5_checksum>
</file_info>
</result>
</uploads>
</xml>
I've tried a few xml parsers such as xml-simple, libxml and quixml, but
all reject this data as badly formed. One answer would, of course, be
for the data to be re-generated using properly formed xml. Meanwhile, is
there anything that could be done with the existing files? Is it a case
of having to write regexps to parse this sort of thing?