P
paul.sherwood
Hi
Im trying to parse a large(150MB) xml file in order to extract specific
required records.
import sys
from elementtree.ElementTree import ElementTree
root = ElementTree(file=big.xml')
This works fine for smaller versions of the same xml file but...when i
attempted the above my PC goes to lala land, theres much HDD grinding
followed by "windows runnign low on virtual memory" popup after
10-15mins. Then just more grinding...for an hour before i gave up
XML file format:
<root>
<rubbish1>
.
.
</rubbish1>
.
.
<rubbishX>
.
.
</rubbishX>
<Products>
<Product ID="QU17861" UserTypeID="PH_QUOTE" QualifierID="Qualifier
root" ParentID="LIVE_AREA">
<Name QualifierID="Qualifier root">23172</Name>
<Description QualifierID="Qualifier root">Three Spot Rail Light
Brushed Chrome</Description>
<ClassificationReference ClassificationID="W2@Kitchen Lighting"
QualifierID="Qualifier root" Type="" />
<ProductReference ProductID="QU17749" QualifierID="Qualifier root"
Type="Accessory / Linked Product">
<Name QualifierID="Qualifier root">73520</Name>
.
.etc
</Product>
</Products>
</root>
Ok, i thought, surely theres a way to parse this thing in chucnks till
i get to the element i require then I'll reuse the ElementTree
goodness.
I found Iterparse
def parse_for_products(filename):
for event, elem in iterparse(filename):
if elem.tag == "Products":
root = ElementTree(elem)
print_all(root)
else:
elem.clear()
My problem is that if i pass the 'elem' found by iterparse then try to
print all attributes, children and tail text i only get
elem.tag....elem.keys returns nothing as do all of the other previously
useful elementtree methods.
Am i right in thinking that you can pass an element into ElementTree?
How might i manually iterate through <product>...</product> grabbing
everything?
Im trying to parse a large(150MB) xml file in order to extract specific
required records.
import sys
from elementtree.ElementTree import ElementTree
root = ElementTree(file=big.xml')
This works fine for smaller versions of the same xml file but...when i
attempted the above my PC goes to lala land, theres much HDD grinding
followed by "windows runnign low on virtual memory" popup after
10-15mins. Then just more grinding...for an hour before i gave up
XML file format:
<root>
<rubbish1>
.
.
</rubbish1>
.
.
<rubbishX>
.
.
</rubbishX>
<Products>
<Product ID="QU17861" UserTypeID="PH_QUOTE" QualifierID="Qualifier
root" ParentID="LIVE_AREA">
<Name QualifierID="Qualifier root">23172</Name>
<Description QualifierID="Qualifier root">Three Spot Rail Light
Brushed Chrome</Description>
<ClassificationReference ClassificationID="W2@Kitchen Lighting"
QualifierID="Qualifier root" Type="" />
<ProductReference ProductID="QU17749" QualifierID="Qualifier root"
Type="Accessory / Linked Product">
<Name QualifierID="Qualifier root">73520</Name>
.
.etc
</Product>
</Products>
</root>
Ok, i thought, surely theres a way to parse this thing in chucnks till
i get to the element i require then I'll reuse the ElementTree
goodness.
I found Iterparse
def parse_for_products(filename):
for event, elem in iterparse(filename):
if elem.tag == "Products":
root = ElementTree(elem)
print_all(root)
else:
elem.clear()
My problem is that if i pass the 'elem' found by iterparse then try to
print all attributes, children and tail text i only get
elem.tag....elem.keys returns nothing as do all of the other previously
useful elementtree methods.
Am i right in thinking that you can pass an element into ElementTree?
How might i manually iterate through <product>...</product> grabbing
everything?