P
ProvoWallis
Hi,
I'm hoping someone can help me. I'm hopelessly lost.
I'm trying to make a change in some XML files using a regular
expression (re.sub). I can capture the text I want to replace OK but
when I replace it end up with nothing: i.e., just a "" character in my
file.
data = re.sub(r'(?i)(?u)<title><emph typestyle=\"bf\">Sample
Title</emph></title><para indent=\"none\" runin=\"1\"><emph
typestyle=\"bf\">\—(.*?):</emph>', '<title><icon
name="graphic"/> <emph typestyle="bf">Sample
Title—\1:</emph></title><para indent="none" runin="1">', data)
I think my problem is that I don't understand unicode or even know how
my XML is encoded b/c there is nothing in the XML declaration at the
top of the file.
I'd be grateful if someone could give a little adive or point me in the
right direction. I've read abunch of stuff on the board but nothing
seems to click.I'm guessing I have to decode my file when I read it
something like this
raw = inputFile.read()
fileencoding = "utf-8"
data = raw.decode(fileencoding)
and then write it out similarly but this doesn't seem to work.
Any help appreciated,
Greg
I'm hoping someone can help me. I'm hopelessly lost.
I'm trying to make a change in some XML files using a regular
expression (re.sub). I can capture the text I want to replace OK but
when I replace it end up with nothing: i.e., just a "" character in my
file.
data = re.sub(r'(?i)(?u)<title><emph typestyle=\"bf\">Sample
Title</emph></title><para indent=\"none\" runin=\"1\"><emph
typestyle=\"bf\">\—(.*?):</emph>', '<title><icon
name="graphic"/> <emph typestyle="bf">Sample
Title—\1:</emph></title><para indent="none" runin="1">', data)
I think my problem is that I don't understand unicode or even know how
my XML is encoded b/c there is nothing in the XML declaration at the
top of the file.
I'd be grateful if someone could give a little adive or point me in the
right direction. I've read abunch of stuff on the board but nothing
seems to click.I'm guessing I have to decode my file when I read it
something like this
raw = inputFile.read()
fileencoding = "utf-8"
data = raw.decode(fileencoding)
and then write it out similarly but this doesn't seem to work.
Any help appreciated,
Greg