BZip2 decompression and parsing XML

P

phasma

Hi.

I'm trying to disassemble bzipped file. If I use minidom.parseString,
I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 342, in ?

File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1925, in
parseString
return expatbuilder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 940, in
parseString
return builder.parseString(string)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 223, in
parseString
parser.Parse(string, True)
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17

If I use minidom.parse, I'm getting this error:

Traceback (most recent call last):
File "./replications.py", line 341, in ?
files.xml = minidom.parse(bz2.decompress(dump))
File "/usr/lib64/python2.4/xml/dom/minidom.py", line 1915, in parse
return expatbuilder.parse(file)
File "/usr/lib64/python2.4/xml/dom/expatbuilder.py", line 922, in
parse
fp = open(file, 'rb')
IOError

But XML parsed normally.

Code:

try:
handler = open(args[0], "r")
dump = handler.read()
handler.close()
except IOError, error:
print("Can't open dump: %s" % error)
sys.exit(1)

files.xml = minidom.parse(bz2.decompress(dump))
 
S

Stefan Behnel

phasma said:
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
538676, column 17

Looks like your XML file is broken in line 538676.

try:
handler = open(args[0], "r")

This should read

handler = open(args[0], "rb")

Maybe that's your problem.

BTW, since you seem to parse a pretty big chunk of XML there, you should
consider using lxml. It's faster, more memory friendly, more feature-rich and
easier to use than minidom. It can also parse directly from a gzip-ed file or
a file-like object as provided by the bz2 module.

http://codespeak.net/lxml/

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top