Autodetect UTF-8 vs ISO-8859-1

L

Lars

As part of an error correction mechanism, I'd like to
autodetect ISO-8859-1 vs UTF-8 usage. Where is this
described concisely?

-Lars
 
D

Dean Tiegs

Lars said:
As part of an error correction mechanism, I'd like to autodetect
ISO-8859-1 vs UTF-8 usage. Where is this described concisely?

For an arbitrary text file, it is impossible to distinguish
automatically between the two and be 100 percent sure of choosing
correctly. However, if the file contains no invalid UTF-8 sequences,
it is almost certainly UTF-8. It would be a very unusual ISO-8859-1
file that did not have invalid UTF-8 sequences.

For XML files, it's much simpler: if it is ISO-8859-1, it has to be
declared in the XML declaration.
 
J

Johannes Koch

Dean said:
For XML files, it's much simpler: if it is ISO-8859-1, it has to be
declared in the XML declaration.

Or some lower-level protocol, like HTTP Content-Type header.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,817
Latest member
DicWeils

Latest Threads

Top