P
Pinke Panke
Dear people,
I wrote a python script to create html files. The structure was stored in
a nested array. For easier maintaining at any time I desribe the
structure in XML, using the minidom parser and a small function to
convert the XML structure into the array structure. So far so good.
Then the mess started. The XML document is described as utf-8, stored as
utf-8. iso-8859-1 makes no difference in this case.
When an character > 128, e.g. an umlaut, occurs my string raises errors.
An example:
headline = structure[0]
pagetext = structure[1]
foo = headline + "bar" + pagetext
In my script there are many of such operations. The simple example is
solved easily with appending .encode('iso-8859-1') at the structure
statements. So far not so nice but ok. I hope there would be a simpler
solution.
But there are also string replacements via regexes. An example to make a
picture of it:
pat = re.compile('<putithere>')
foo = 'def'
bar = 'abc<putithere>ghi'
htmlcode = pat.sub(foo,bar)
Appending .encode(...) to foo and bar does not fix the UnicodeError.
Is there any solution, something I forgot or I could make better? Is
there any logic behind it? ;-)
TIA.
Martin
I wrote a python script to create html files. The structure was stored in
a nested array. For easier maintaining at any time I desribe the
structure in XML, using the minidom parser and a small function to
convert the XML structure into the array structure. So far so good.
Then the mess started. The XML document is described as utf-8, stored as
utf-8. iso-8859-1 makes no difference in this case.
When an character > 128, e.g. an umlaut, occurs my string raises errors.
An example:
headline = structure[0]
pagetext = structure[1]
foo = headline + "bar" + pagetext
In my script there are many of such operations. The simple example is
solved easily with appending .encode('iso-8859-1') at the structure
statements. So far not so nice but ok. I hope there would be a simpler
solution.
But there are also string replacements via regexes. An example to make a
picture of it:
pat = re.compile('<putithere>')
foo = 'def'
bar = 'abc<putithere>ghi'
htmlcode = pat.sub(foo,bar)
Appending .encode(...) to foo and bar does not fix the UnicodeError.
Is there any solution, something I forgot or I could make better? Is
there any logic behind it? ;-)
TIA.
Martin