D
dirkheld
Hi,
I have written a piece of code that reads all xml files in a directory
in onder to retrieve one element in each of these files. All files
have the same XML structure. After file 123 I receive the following
error :
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20
I guess that the element I try to read or the XML(which would be
strange since they have been created with the same code) can't ben
retrieved.
Is there a way to :
1. fix this problems so that I can retrieve it
2. is there a way that after such an error the invalid file is being
skipped and the program continues with reading the subsequent files;
Some sort of error handling?
Here is the code I use :
from xml.dom import minidom
import os
path = "/Documents/programming/data/xml/"
dirList = os.listdir(path)
url_file=open('/Documents/programming/data/xml/test.txt','w')
for file in dirList:
xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
xml_elem = xmldoc.getElementsByTagName('webpage')
web_elem = xml_elem[0]
url = web_elem.attributes['uri']
url_file.write(url.value + '\n')
url_file.close()
I have written a piece of code that reads all xml files in a directory
in onder to retrieve one element in each of these files. All files
have the same XML structure. After file 123 I receive the following
error :
xml.parsers.expat.ExpatError: not well-formed (invalid token): line
554, column 20
I guess that the element I try to read or the XML(which would be
strange since they have been created with the same code) can't ben
retrieved.
Is there a way to :
1. fix this problems so that I can retrieve it
2. is there a way that after such an error the invalid file is being
skipped and the program continues with reading the subsequent files;
Some sort of error handling?
Here is the code I use :
from xml.dom import minidom
import os
path = "/Documents/programming/data/xml/"
dirList = os.listdir(path)
url_file=open('/Documents/programming/data/xml/test.txt','w')
for file in dirList:
xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
xml_elem = xmldoc.getElementsByTagName('webpage')
web_elem = xml_elem[0]
url = web_elem.attributes['uri']
url_file.write(url.value + '\n')
url_file.close()