B
byron
I am using the lxml.etree library to validate an xml instance file
with a specified schema that contains the data types of each element.
This is some of the internals of a function that extracts the
elements:
schema_doc = etree.parse(schema_fn)
schema = etree.XMLSchema(schema_doc)
context = etree.iterparse(xml_fn, events=('start', 'end'),
schema=schema)
# get root
event, root = context.next()
for event, elem in context:
if event == 'end' and elem.tag == self.tag:
yield elem
root.clear()
I retrieve a list of elements from this... and do further processing
to represent them in different ways. I need to be able to capture the
data type from the schema definition for each field in the element.
i.e.
<xsd:element name="concept">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="foo"/>
<xsd:element name="concept_id" type="xsd:string"/>
<xsd:element name="line" type="xsd:integer"/>
<xsd:element name="concept_value" type="xsd:string"/>
<xsd:element ref="some_date"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
My thought is to recursively traverse through the schema definition
match the `name` attribute since they are unique to a `type` and
return that element. But I can't seem to make it quite work. All the
xml is valid, validation works, etc. This is what I have:
def find_node(tree, name):
for c in tree:
if c.attrib.get('name') == name:
return c
if len(c) > 0:
return find_node(c, name)
return 0
I may have been staring at this too long, but when something is
returned... it should be returned completely, no? This is what occurs
with `return find_node(c, name) if it returns 0. `return c` works
(used pdb to verify that), but the recursion continues and ends up
returning 0.
Thoughts and/or a different approach are welcome. Thanks
with a specified schema that contains the data types of each element.
This is some of the internals of a function that extracts the
elements:
schema_doc = etree.parse(schema_fn)
schema = etree.XMLSchema(schema_doc)
context = etree.iterparse(xml_fn, events=('start', 'end'),
schema=schema)
# get root
event, root = context.next()
for event, elem in context:
if event == 'end' and elem.tag == self.tag:
yield elem
root.clear()
I retrieve a list of elements from this... and do further processing
to represent them in different ways. I need to be able to capture the
data type from the schema definition for each field in the element.
i.e.
<xsd:element name="concept">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="foo"/>
<xsd:element name="concept_id" type="xsd:string"/>
<xsd:element name="line" type="xsd:integer"/>
<xsd:element name="concept_value" type="xsd:string"/>
<xsd:element ref="some_date"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
My thought is to recursively traverse through the schema definition
match the `name` attribute since they are unique to a `type` and
return that element. But I can't seem to make it quite work. All the
xml is valid, validation works, etc. This is what I have:
def find_node(tree, name):
for c in tree:
if c.attrib.get('name') == name:
return c
if len(c) > 0:
return find_node(c, name)
return 0
I may have been staring at this too long, but when something is
returned... it should be returned completely, no? This is what occurs
with `return find_node(c, name) if it returns 0. `return c` works
(used pdb to verify that), but the recursion continues and ends up
returning 0.
Thoughts and/or a different approach are welcome. Thanks