M
mrdecav
Hey all,
I have a bizzare problem with a piece of mail (most likely sent by
Outlook) that is in UTF-8 format.
There is a character, coming after spaces, which from looking at a
hexdump of the file, seems to be a CA (decimal: 202). From most UTF-8
documentation I can find, this is an accent circumflex.
In browsers (IE, FF, Safari), this character shows up as an unknown
character, or as the accent circumflex. In a mail browser, however
(Outlook, Apple Mail), the character appears as a "NO-BREAK
WHITESPACE" (just a space visually), or the equivelent of an " ".
Some documentation I have found shows this is a NO-BREAK WHITESPACE,
and it is clearly what the intent is. The HTML header and MIME type
of the body part both claim UTF-8 encoding.
Is there something I am missing here? Why does this show up
incorrectly in browsers, or why do mail clients feel compelled to
replace this character, but browsers don't? Is there an easy fix to
this? I am concerned that if I actually strip the CA, I'll break
emails that actually are supposed to have the accent.
The following hex is an example of the issue:
00000250 20 64 65 73 69 67 6e 2e 20 ca 49 0d 0a 68 61 76 | design. ?
I..hav|
00000260 65 20 61 20 66 65 77 20 6d 69 6e 6f 72 20 64 65 |e a few
minor de|
design. <offending character>I have
Thanks in advance,
Andre de Cavaignac
I have a bizzare problem with a piece of mail (most likely sent by
Outlook) that is in UTF-8 format.
There is a character, coming after spaces, which from looking at a
hexdump of the file, seems to be a CA (decimal: 202). From most UTF-8
documentation I can find, this is an accent circumflex.
In browsers (IE, FF, Safari), this character shows up as an unknown
character, or as the accent circumflex. In a mail browser, however
(Outlook, Apple Mail), the character appears as a "NO-BREAK
WHITESPACE" (just a space visually), or the equivelent of an " ".
Some documentation I have found shows this is a NO-BREAK WHITESPACE,
and it is clearly what the intent is. The HTML header and MIME type
of the body part both claim UTF-8 encoding.
Is there something I am missing here? Why does this show up
incorrectly in browsers, or why do mail clients feel compelled to
replace this character, but browsers don't? Is there an easy fix to
this? I am concerned that if I actually strip the CA, I'll break
emails that actually are supposed to have the accent.
The following hex is an example of the issue:
00000250 20 64 65 73 69 67 6e 2e 20 ca 49 0d 0a 68 61 76 | design. ?
I..hav|
00000260 65 20 61 20 66 65 77 20 6d 69 6e 6f 72 20 64 65 |e a few
minor de|
design. <offending character>I have
Thanks in advance,
Andre de Cavaignac