P
python
I assume there's no standard library function that wraps
codecs.open() to sniff a file's BOM header and open the file with
the appropriate encoding?
My reading of the docs leads me to believe that there are 5
types of possible BOM headers with multiple names (synoymns?)
for the same BOM encoding type.
BOM = '\xff\xfe'
BOM_LE = '\xff\xfe'
BOM_UTF16 = '\xff\xfe'
BOM_UTF16_LE = '\xff\xfe'
BOM_BE = '\xfe\xff'
BOM32_BE = '\xfe\xff'
BOM_UTF16_BE = '\xfe\xff'
BOM64_BE = '\x00\x00\xfe\xff'
BOM_UTF32_BE = '\x00\x00\xfe\xff'
BOM64_LE = '\xff\xfe\x00\x00'
BOM_UTF32 = '\xff\xfe\x00\x00'
BOM_UTF32_LE = '\xff\xfe\x00\x00'
BOM_UTF8 = '\xef\xbb\xbf'
Is the process of writing a BOM sniffer readlly as simple
as detecting one of these 5 header types and then calling
codecs.open() with the appropriate encoding= parameter?
Note: I'm only interested in Unicode encodings. I am not
interested in any of the non-Unicode encodings supported
by the codecs module.
Thank you,
Malcolm
codecs.open() to sniff a file's BOM header and open the file with
the appropriate encoding?
My reading of the docs leads me to believe that there are 5
types of possible BOM headers with multiple names (synoymns?)
for the same BOM encoding type.
BOM = '\xff\xfe'
BOM_LE = '\xff\xfe'
BOM_UTF16 = '\xff\xfe'
BOM_UTF16_LE = '\xff\xfe'
BOM_BE = '\xfe\xff'
BOM32_BE = '\xfe\xff'
BOM_UTF16_BE = '\xfe\xff'
BOM64_BE = '\x00\x00\xfe\xff'
BOM_UTF32_BE = '\x00\x00\xfe\xff'
BOM64_LE = '\xff\xfe\x00\x00'
BOM_UTF32 = '\xff\xfe\x00\x00'
BOM_UTF32_LE = '\xff\xfe\x00\x00'
BOM_UTF8 = '\xef\xbb\xbf'
Is the process of writing a BOM sniffer readlly as simple
as detecting one of these 5 header types and then calling
codecs.open() with the appropriate encoding= parameter?
Note: I'm only interested in Unicode encodings. I am not
interested in any of the non-Unicode encodings supported
by the codecs module.
Thank you,
Malcolm