T
Thomas Heller
I want to use ConfigParser with both NT4-style .reg files, which are
ascii (or ansi?) files, and XP-stype .reg files which seem to be UTF-16
encoded unicode-files (hope that's the correct terminology). [And yes, I
have read the warning in the manual that ConfigParser doesn't interpret
the value-type prefixes in the reg files]
Here's the start of the method I wrote to detect the encoding and read
the file:
def _parse_regfile(self, filename):
ifi = open(filename, "r")
import codecs, StringIO
if ifi.read(2) in (codecs.BOM_LE, codecs.BOM_BE):
ifi.close()
ifi = codecs.open(filename, "r", "utf-16")
I wonder: do I really have to check for the BOM manually, or is there a
Python function which does that?
Continuing the code:
# ConfigParser calls .readline(), but:
# NotImplementedError: '.readline() is not implemented for UTF-16'
# so we need to put the data into a StringIO instance.
# Um, cStringIO doesn't handle unicode correctly, so we'll have
# to use the slower StringIO
ifi = StringIO.StringIO(ifi.read())
ifi.readline() # skip the first two lines
ifi.readline()
c = ConfigParser()
c.readfp(ifi)
return c
Is there a better way to do this? Why doesn't the UTF-16 codec
implement readline()?
Thomas
ascii (or ansi?) files, and XP-stype .reg files which seem to be UTF-16
encoded unicode-files (hope that's the correct terminology). [And yes, I
have read the warning in the manual that ConfigParser doesn't interpret
the value-type prefixes in the reg files]
Here's the start of the method I wrote to detect the encoding and read
the file:
def _parse_regfile(self, filename):
ifi = open(filename, "r")
import codecs, StringIO
if ifi.read(2) in (codecs.BOM_LE, codecs.BOM_BE):
ifi.close()
ifi = codecs.open(filename, "r", "utf-16")
I wonder: do I really have to check for the BOM manually, or is there a
Python function which does that?
Continuing the code:
# ConfigParser calls .readline(), but:
# NotImplementedError: '.readline() is not implemented for UTF-16'
# so we need to put the data into a StringIO instance.
# Um, cStringIO doesn't handle unicode correctly, so we'll have
# to use the slower StringIO
ifi = StringIO.StringIO(ifi.read())
ifi.readline() # skip the first two lines
ifi.readline()
c = ConfigParser()
c.readfp(ifi)
return c
Is there a better way to do this? Why doesn't the UTF-16 codec
implement readline()?
Thomas