getting text out of an xml string

J

JT

Yo,

So I have almost convinced a small program to do what I want it to
do. One thing remains (at least, one thing I know of at the moment):
I am converting xml to some other format, and there are strings in the
xml like this.

The python:

elif v == "content":
print "content", a.childNodes[0].nodeValue

what gets printed:

content \u3c00note xml:space="preserve"\u3e00see forms in red inbox
\u3c00/note\u3e00

what this should say is "see forms in red inbox" because that is what
the the program whose xml file i am trying to convert, properly
displays, because that is what I typed in oh so long ago. So my
question to you is, how can I convert this "enhanced" version to a
normal string? Esp. since there is this "xml:space="preserve"" thing
in there ... I suspect the rest is just some unicode issue. Thanks
for any help.

J "long time no post" T
 
J

John Machin

Yo,

 So I have almost convinced a small program to do what I want it to
do.  One thing remains (at least, one thing I know of at the moment):
I am converting xml to some other format, and there are strings in the
xml like this.

The python:

elif v == "content":
                print "content", a.childNodes[0].nodeValue

what gets printed:

content \u3c00note xml:space="preserve"\u3e00see forms in red inbox
\u3c00/note\u3e00

what this should say is "see forms in red inbox" because that is what
the the program whose xml file i am trying to convert, properly
displays, because that is what I typed in oh so long ago.  So my
question to you is, how can I convert this "enhanced" version to a
normal string?  Esp. since there is this "xml:space="preserve"" thing
in there ... I suspect the rest is just some unicode issue.  Thanks
for any help.

       J "long time no post" T

Your data has been FUABARred (the first A being for Almost) -- the
"\u3c00" and "\u3e00" were once "<" and ">" respectively. You will
need to show (a) a snippet of the xml file including the data that has
the problem (b) the code that you have written, cut down to a small
script that is runnable and displays the problem. Tell us what version
of Python you are running, on what OS.
 
J

JT

Your data has been FUABARred (the first A being for Almost) -- the
"\u3c00" and "\u3e00" were once "<" and ">" respectively. You will

Hi John,

I realized that a few minutes after posting. I then realized that
I could just extract the text between the stuff with \u3c00 xml
preserve etc, which I did; it was good enough since it was a one-off
affair, I had to convert a to-do list from one program to another.
Thanks for replying and sorry for the noise :)

JT
 
J

John Machin

Hi John,

   I realized that a few minutes after posting.  I then realized that
I could just extract the text between the stuff with \u3c00 xml
preserve etc, which I did; it was good enough since it was a one-off
affair, I had to convert a to-do list from one program to another.
Thanks for replying and sorry for the noise :)

Next time you need to extract some data from an xml file, please (for
your own good) don't do whatever you did in that code -- note that the
unicode equivalent of "<" is u"\u003c", NOT u"\u3c00"; I wasn't joking
when I said it had been FU.
 
B

björn lundin

Next time you need to extract some data from an xml file, please (for
your own good) don't do whatever you did in that code -- note that the
unicode equivalent of "<" is u"\u003c", NOT u"\u3c00"; I wasn't joking
when I said it had been FU.

Is that perhaps the doing of going from littleEndian to bigEndian
(or vice versa) machines with that file, in some kind of strange
binary mode?
/björn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,817
Latest member
AdalbertoT

Latest Threads

Top