T
Ted Byers
Again, I am trying to automatically process data I receive by email,
so I have no control over the data that is coming in.
The data is supposed to be plain text/HTML, but there are quite a
number of records where the contraction "rec'd" is misrepresented when
written to standard out as "Rec\342\200\231d"
When the data is written to a file, these characters are represented
by the character ' when it is opened using notepad, but by the string
'’' when it is opened by open office.
So how do I tell what character it is when in three different contexts
it is displayed in three different ways? How can I make certain that
when I either print it or store it in my DB, I get the correct
"rec'd" (or, better, "received")?
I suspect a minor glitch in the software that makes and send the email
as this is the ONLY string where what ought to be an ascii ' character
is identified as a wide character. Regardless of how that happens (as
I don't control that), I need to clean this. And it gets confusing
when different applications handle the i18n differently (Notepad is
undoubtedly using the OS i18n support and Open Office is handling it
differently, and Emacs is doing it differently from both).
A little enlightenment would be appreciated.
Thanks
Ted
so I have no control over the data that is coming in.
The data is supposed to be plain text/HTML, but there are quite a
number of records where the contraction "rec'd" is misrepresented when
written to standard out as "Rec\342\200\231d"
When the data is written to a file, these characters are represented
by the character ' when it is opened using notepad, but by the string
'’' when it is opened by open office.
So how do I tell what character it is when in three different contexts
it is displayed in three different ways? How can I make certain that
when I either print it or store it in my DB, I get the correct
"rec'd" (or, better, "received")?
I suspect a minor glitch in the software that makes and send the email
as this is the ONLY string where what ought to be an ascii ' character
is identified as a wide character. Regardless of how that happens (as
I don't control that), I need to clean this. And it gets confusing
when different applications handle the i18n differently (Notepad is
undoubtedly using the OS i18n support and Open Office is handling it
differently, and Emacs is doing it differently from both).
A little enlightenment would be appreciated.
Thanks
Ted