Using ElementTree to tidy up an XML string to my liking

P

peterbe

I have an XML string coming in from one system that I'd like to tidy up
and return in a very particular format. I'm picky!

If the input is
<SOMETHING attr1="foo1"
attr2='foo2' >

Then the output must be
<something
attr1="foo1"
attr2="foo2">

The file might have comments and namespaces and all of this should be
preserved as it was when it came in.

I was hoping to use ElementTree which can parse my input so that I rest
on solid foundations instead of mad regular expressions and string
manipulation.
But, if mad regular expressions and string manipulations is what it
takes I'll have to settle on that.

What are my options?
 
P

peterbe

Yes I am but, I'm using a DOM Serializer in Firefox which for some
reason turns myCamelNames into MYCAMELNAMES for the nodenames. I'll
therefore need to control the case-spelling of these things as I'm
formatting the XML string.
 
M

Magnus Lycka

Yes I am but, I'm using a DOM Serializer in Firefox which for some
reason turns myCamelNames into MYCAMELNAMES for the nodenames. I'll
therefore need to control the case-spelling of these things as I'm
formatting the XML string.

I realize that it's difficult to make sure that every user of your
system has a corrected Firefox, but it seems to me that there is
a bug in that DOM Serializer, and that it would be good to mend
that.

Concerning element names, it's your coice of course, but I agree
more and more with Guido and PEP008 that camelCase is ugly. (Not
that ALLCAPS is better...)
 
H

Heikki Toivonen

Yes I am but, I'm using a DOM Serializer in Firefox which for some
reason turns myCamelNames into MYCAMELNAMES for the nodenames. I'll
therefore need to control the case-spelling of these things as I'm
formatting the XML string.

I am almost certain there is something wrong you are doing in Mozilla if
that is happening. My first guess is that you are really doing HTML even
thought you think you are doing XML, and therefore Mozilla converts your
stuff to uppercase. Without seeing the source it is hard to say. I would
advice you write to the Mozilla forums for advice on that.
 
R

Richard Townsend

Concerning element names, it's your coice of course, but I agree
more and more with Guido and PEP008 that camelCase is ugly. (Not
that ALLCAPS is better...)

I can see in PEP008 where it says Capitalized_Words_With_Underscores is
ugly, but I can't see where it says pure camelCase is ugly ?
 
M

Magnus Lycka

Richard said:
I can see in PEP008 where it says Capitalized_Words_With_Underscores is
ugly, but I can't see where it says pure camelCase is ugly ?

Sorry, meant mixedCase (as it was in the XML example). Peter confused
me by writing "myCamelNames"

"""
Function Names

Function names should be lowercase, possibly with words separated by
underscores to improve readability. mixedCase is allowed only in
contexts where that's already the prevailing style (e.g. threading.py),
to retain backwards compatibility.

Method Names and Instance Variables

The story is largely the same as with functions: in general, use
lowercase with words separated by underscores as necessary to improve
readability.
"""
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,851
Latest member
CliftonCor

Latest Threads

Top