Lars said:
As part of an error correction mechanism, I'd like to autodetect
ISO-8859-1 vs UTF-8 usage. Where is this described concisely?
For an arbitrary text file, it is impossible to distinguish
automatically between the two and be 100 percent sure of choosing
correctly. However, if the file contains no invalid UTF-8 sequences,
it is almost certainly UTF-8. It would be a very unusual ISO-8859-1
file that did not have invalid UTF-8 sequences.
For XML files, it's much simpler: if it is ISO-8859-1, it has to be
declared in the XML declaration.