The last may be a clue. Are you sure that it isn't a Unicode
UTF-8 byte order mark (BOF)? It would appear as "?".
That's the first thing that occurred to me as well---that's why I
asked about his toolset. If he's outputting wchar_t Unicode,
the library could very well insert a BOM at the start of the
file; who knows how another tool, which thought it was reading
char data, might interpret this.
Your suggestion of a BOM somehow converted to UTF-8 is a good
one, however, since one of the bytes would be 0xBF, which is the
inverted question mark. In UTF-8, a BOM is illegal, and should
never be output, but if he's outputting UTF-16, which is being
naïvely converted to UTF-8, I wouldn't be surprised to see it.
Check out <
http://en.wikipedia.org/wiki/Byte_order_mark>. As
to how it gets there, and whether you should leave it there or
remove it, well, anyone's guess without knowing much about
your program, data, processing etc., but it's most likely not
a C++ issue.
Well, it's sort of a C++ issue, in that the committee is
addressing it: the next version of C++ will (conditionally?)
have two types, char16_t and char32_t, which are guaranteed to
be UTF-16 and UTF-32 (if they are present). But anything
involving character encodings will of necessity go beyond
language issues---there's no way C++ can have anything to do
with e.g. the encodings of the fonts in your printer.