expat parser

S

Sebastian Bassi

I have this code:

import xml.parsers.expat
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
p.ParseFile(fh)

And I get this on the output:

....
Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length': u'393'}
Character data: u'\n'
Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRL'
Character data: u'\n'
Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH'
....
End element: sequence
....

Is there a way to have the character data together in one string? I
guess it should not be difficult, but I can't do it. Each time the
parse reads a line, return a line, and I want to have it in one
variable.

(the file is here: http://sbassi.googlepages.com/smallUniprot.xml)
 
S

Stefan Behnel

Sebastian said:
I have this code:

import xml.parsers.expat
def start_element(name, attrs):
print 'Start element:', name, attrs
def end_element(name):
print 'End element:', name
def char_data(data):
print 'Character data:', repr(data)
p = xml.parsers.expat.ParserCreate()
p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data
fh=open("/home/sbassi/bioinfo/smallUniprot.xml","r")
p.ParseFile(fh)

And I get this on the output:

...
Start element: sequence {u'checksum': u'E0C0CC2E1F189B8A', u'length':
u'393'}
Character data: u'\n'
Character data: u'MPKKKPTPIQLNPAPDGSAVNGTSSAETNLEALQKKLEELELDEQQRKRL'
Character data: u'\n'
Character data: u'EAFLTQKQKVGELKDDDFEKISELGAGNGGVVFKVSHKPSGLVMARKLIH'
...
End element: sequence
...

Is there a way to have the character data together in one string? I
guess it should not be difficult, but I can't do it. Each time the
parse reads a line, return a line, and I want to have it in one
variable.

Any reason you are using expat and not cElementTree's iterparse?

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,981
Messages
2,570,188
Members
46,731
Latest member
MarcyGipso

Latest Threads

Top