lmlx & & caracter

T

Toff

hello

i can't parse a file with lxml


================================================================
<?xml version="1.0" encoding="iso-8859-1" ?>
<packages>
<package id="firefox" name="FireFox 3.5" >
<download url="http://download.mozilla.org/?
product=firefox-3.5&os=win&lang=fr" />
</package>
</packages>
=========================================================================

i know that & is a special caracter in xml

but how to parse the file

here is what i do :
filename = "file.xml"
xml = etree.parse(filename)
xml = ET.parse(filename)
for group in xml.getiterator('package'):
print group.get('id')
 
S

Stefan Behnel

Toff, 27.10.2009 09:15:
but does my download url still allways works with & replace by $amp;

Sorry, my English fails to parse that. If that was a question, could you
rephrase it, please?

Stefan
 
S

Stefan Behnel

Toff, 27.10.2009 09:15:
but does my download url still allways works with & replace by $amp;

In case you want to know if the URL keeps working if you replace '&' by
'&amp;' in your XML file, then the answer is that it will actually /start/
working that way, because it will not work the way you show above. An XML
parser will return the URL as expected.

Note that it's also best to use an XML toolkit to actually generate XML, in
case you wrote the above file programmatically.

Stefan
 
T

Toff

sorry for my english
thanks again for your answer ;)


i did that ....

o = open("output.txt","w")
data = open(filename).read()
o.write( re.sub('&(?!amp;|quot;|nbsp;|gt;|lt;|laquo;|
raquo;|copy;|reg;|bul;|rsquo;)', '&amp;', data) )
o.close()
shutil.move
("output.txt",filename)
xml = etree.parse(filename)
xml = ET.parse(filename)

it's dirty but works
 
S

Stefan Behnel

Toff, 27.10.2009 10:14:
o = open("output.txt","w")
data = open(filename).read()
o.write( re.sub('&(?!amp;|quot;|nbsp;|gt;|lt;|laquo;|
raquo;|copy;|reg;|bul;|rsquo;)', '&amp;', data) )
o.close()
shutil.move
("output.txt",filename)
xml = etree.parse(filename)
xml = ET.parse(filename)

Absolutely no reason to write a file here. Use ET.fromstring().

And, just to repeat my question in other words: can't you just fix the
'XML' file before hand? I.e. is it a broken XML file coming from an outside
source that you cannot control, or did you create it yourself?

Stefan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,246
Members
46,840
Latest member
BrendanG78

Latest Threads

Top