python and parsing an xml file

M

Matt Funk

Hi,
I was wondering if someone had some advice:
I want to create a set of xml input files to my code that look as follows:
<?xml version="1.0" encoding="UTF-8"?>

<!-- Settings for the algorithm to be performed
-->
<Algorithm>

<!-- The algorithm type.
-->
<!-- The supported options are:
-->
<!-- - Alg0
-->
<!-- - Alg1
-->
<Type>Alg1</Type>

<!-- the location/path of the input file for this algorithm
-->
<path>./Alg1.in</path>

</Algorithm>


<!-- Relevant information during the processing will be written to a
logfile -->
<Logfile>

<!-- the location/path of the logfile (i.e. where to put the
logfile) -->
<path>c:\tmp</path>

<!-- verbosity level (i.e. how much to print)
-->
<!-- The supported options are:
-->
<!-- - 0 (nothing printed)
-->
<!-- - 1 (print on error)
-->
<verbosity>1</verbosity>

</Logfile>


So there are comments, whitespace etc ... in it.
I would like to be able to put everything into some sort of structure
such that i can access it as:
structure['Algorithm']['Type'] == Alg1
I was wondering if there is something out there that does this.
I found and tried a few things:
1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
It simply doesn't work. I get the following error:
raise exception
xml.sax._exceptions.SAXParseException: <unknown>:1:2: not well-formed
(invalid token)
But i removed everything from the file except: <?xml version="1.0"
encoding="UTF-8"?>
and i still got the error.

Anyway, i looked at ElementTree, but that error out with:
xml.parsers.expat.ExpatError: junk after document element: line 19, column 0


Anyway, if anyone can give me advice of point me somewhere i'd greatly
appreciate it.

thanks
matt
 
P

Paul Anton Letnes

Den 21.02.11 18.30, skrev Matt Funk:
Hi,
I was wondering if someone had some advice:
I want to create a set of xml input files to my code that look as follows:
<?xml version="1.0" encoding="UTF-8"?>

<!-- Settings for the algorithm to be performed
-->
<Algorithm>

<!-- The algorithm type.
-->
<!-- The supported options are:
-->
<!-- - Alg0
-->
<!-- - Alg1
-->
<Type>Alg1</Type>

<!-- the location/path of the input file for this algorithm
-->
<path>./Alg1.in</path>

</Algorithm>


<!-- Relevant information during the processing will be written to a
logfile -->
<Logfile>

<!-- the location/path of the logfile (i.e. where to put the
logfile) -->
<path>c:\tmp</path>

<!-- verbosity level (i.e. how much to print)
-->
<!-- The supported options are:
-->
<!-- - 0 (nothing printed)
-->
<!-- - 1 (print on error)
-->
<verbosity>1</verbosity>

</Logfile>


So there are comments, whitespace etc ... in it.
I would like to be able to put everything into some sort of structure
such that i can access it as:
structure['Algorithm']['Type'] == Alg1
I was wondering if there is something out there that does this.
I found and tried a few things:
1) http://code.activestate.com/recipes/534109-xml-to-python-data-structure/
It simply doesn't work. I get the following error:
raise exception
xml.sax._exceptions.SAXParseException:<unknown>:1:2: not well-formed
(invalid token)
But i removed everything from the file except:<?xml version="1.0"
encoding="UTF-8"?>
and i still got the error.

Anyway, i looked at ElementTree, but that error out with:
xml.parsers.expat.ExpatError: junk after document element: line 19, column 0


Anyway, if anyone can give me advice of point me somewhere i'd greatly
appreciate it.

thanks
matt

How about skipping the whole xml thing? You can dynamically import any
python module, even if it does not have a python filename. I show an
example from my own code, slightly modified. I just hand the function a
filename, and it tries to import the file. If the input file now
contains variables like
algorithm = 'fast'
I can access the variables with
input = getinput('f.txt')
print input.algorithm.

Good luck,
Paul

+++++++++++++

import imp
def getinput(inputfilename):
"""Parse inputs to program from the given file."""
try:
# http://docs.python.org/library/imp.html#imp.load_source
parameters = imp.load_source("parameters", inputfilename)
except IOError as err:
print >>sys.stderr, '%s: %s' % (str(err), inputfilename)
print >>sys.stderr, 'The specified input file was not found -
exiting'
sys.exit(IO_ERROR)
# Verify presence of all required input parameters, see input.py
example file.
required_parameter_names = ['setting1', 'setting2', 'setting3']
msg = 'Required parameter name not found in input file: {0}'
for parameter_name in required_parameter_names:
assert hasattr(parameters, parameter_name),
msg.format(parameter_name)
return parameters
 
P

python

Paul,
How about skipping the whole xml thing? You can dynamically import any python module, even if it does not have a python filename.

Great example!

Can you do the same with a cStringIO based file that exists in memory
vs. on disk? Your example requires a physical file on disk. Is it
possible to accomplish the same using a string vs. a file or temporary
file?

Thank you,
Malcolm (not the OP)
 
P

Paul Anton Letnes

Den 22.02.11 13.29, skrev (e-mail address removed):
Paul,


Great example!

Can you do the same with a cStringIO based file that exists in memory
vs. on disk? Your example requires a physical file on disk. Is it
possible to accomplish the same using a string vs. a file or temporary
file?

Thank you,
Malcolm (not the OP)

Malcolm,

I never had the interest, so I did not try. Why don't you just try it?
Just insert your StringIO file into the function and see if it works, or
if it crashes and burns. Let me know what you find out!

Anyway, you could always just write a temporary file. It won't consume a
lot of resources, I imagine.

Cheers,
Paul
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top