utf-8

  • Thread starter charles cashion
  • Start date
C

charles cashion

I notice that some messages have the following two lines in the header

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

I note that UTF-8 does something that allows other characters
to be displayed.

Q1: How does one turn on UTF-8 if you use Thunderbird?
Q2: How do you include special characters after you turn on
UTF-8?
Thank you,
Charles
 
J

Jukka K. Korpela

charles said:
I notice that some messages have the following two lines in the header

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

Which messages? E-mail, Usenet, something else? Sounds like you should ask
in a group devoted to the specific program you are using.

At the general level, as Internet message headers, which might appear even
in an HTTP server response headers, they're very clueless. UTF-8 cannot, in
general, be transmitted in a 7-bit encoding. If it can, then all the
characters are in the ASCII range, and it would be much better to declared
US-ASCII
I note that UTF-8 does something that allows other characters
to be displayed.

Well, roughly so... one could actually write a book (and I did) in order to
answer such questions properly. :)
Q1: How does one turn on UTF-8 if you use Thunderbird?

It was long ago that I last used Thunderbird, but it's something in the
settings called "Encoding" or "Character encoding".
Q2: How do you include special characters after you turn on
UTF-8?

It depends on how special they are. Normally, characters outside a commonly
supported range (like Latin 1 in the Western world) should be used in email
or Usenet between consenting adults only. On the web, to get closer to
alt.html topics, it's much different, but it still depends on what you
want - like using letter "c" with acute accent as compared to using ancient
Phoenician letters.

Thunderbird isn't good at supporting the entry of special characters, so
your best shot would be to use some Unicode-capable editor and copy & paste.
 
H

Harlan Messinger

charles said:
I notice that some messages have the following two lines in the header

Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: 7bit

I note that UTF-8 does something that allows other characters
to be displayed.

Q1: How does one turn on UTF-8 if you use Thunderbird?
Q2: How do you include special characters after you turn on
UTF-8?
It depends on your operating system or on tools for that purpose that
you have available. In Windows, you can go to Control Panel | Keyboards
and install a keyboard appropriate for whatever language you are trying
to write. For a Western-style keyboard, in Vista and I think in XP, you
can use the on-screen keyboard found under "Ease of Access" under
"Accessories" in your Programs on the Start menu to see what the
keyboard's layout is. For east Asian languages, you might need to use
one of the available IMEs (input method editor), each of which requires
a little instruction to be able to use fully.

Alternatively, you can use the Character Map app under Accessories to
hunt for the characters you want and copy them using the clipboard.

In any event, *how* to enter "special" characters isn't a function of
Thunderbird itself.
 
J

Jukka K. Korpela

Harlan said:
charles cashion wrote: [...]
Q1: How does one turn on UTF-8 if you use Thunderbird?
Q2: How do you include special characters after you turn on
UTF-8?
It depends on your operating system or on tools for that purpose that
you have available.

"It"? You quoted two questions; are you answering both of them, or what?

Q1, though widely off-topic here, is fairly simple if you look at the
program (and maybe even RTFM, though Thunderbird manual is... er... freely
written).

When composing a message, select
Options > Character Encoding
and pick up "Unicode (UTF-8)".
There's another way, too. Just compose the message and click on "Send". If
there are characters not representable in the current encoding, Thunderbird
will ask whether to switch to UTF-8 or munge the characters.

When reading a message, use View > Character Encoding.
In Windows, you can go to Control Panel |
Keyboards and install a keyboard appropriate for whatever language
you are trying to write.

You are probably addressing Q2. The approach you propose is feasible - to
the small fraction of people who understand keyboard settings and can change
them as needed (dealing with the issue that keyboard drivers have been
designed for specific physical keyboards, causing various problems when
using them on keyboards of other types). And, of course, millions of people
just cannot install a keyboard driver (which is what you mean by "keyboard"
here), or any other program, since the enforced Company Polic[ey] prevents
that.
Alternatively, you can use the Character Map app under Accessories to
hunt for the characters you want and copy them using the clipboard.

That's about the clumsiest method, but it's universal (for any character
that exists in some font on your computer - note that you do _not_ need to
have that when using many other methods of character insertion, i.e. it is
quite possible to insert a character without seeing it). It's good to have
the understanding of such a method in your toolbox, and quite fine to use it
to insert a character that you need just once in your life.

Then there's Korpela's Law on Unicode Character Entry: "There's always a
simpler way." You can develop easier methods for characters you need often,
though the development work may take its time, and you need to decide which
characters you really need frequently.
In any event, *how* to enter "special" characters isn't a function of
Thunderbird itself.

Oh but it is. Thunderbid has its own functions for that, and you can use
them along with methods external to it.

You can use, in message composition window, the command
Insert > Characters and Symbols
and pick up characters from dropdown menus. The repertoire is limited and
the method is not very convenient, but you can use the method to insert e.g.
â è µ ® ×

You can also use HTML, and this makes my message somewhat on-topic.

You can select, in message composition window,
Insert > HTML
and enter any character using a character reference like – or, when
applicable, an entity reference like Ω. This will automagically turn
the message format to "Rich Text" (HTML), and the references will appear as
the characters they denote.

You can even set the message format to plain text. Thunderbird seems to
convert the references to the corresponding characters.
 
A

Andy Dingley

At the general level, as Internet message headers, which might appear even
in an HTTP server response headers, they're very clueless. UTF-8 cannot, in
general, be transmitted in a 7-bit encoding.

Of course it can - doesn't this just mean that the UTF8 octets have
been re-encoded to be 7-bit clean by some additional transport
protocol, not that the Unicode codepoints have been restricted to some
"lower 7 bits" subset (which would then be little more than ASCII)
 
J

Jukka K. Korpela

Andy said:
Of course it can

Formally, you got me here, perhaps due to my inability to use articles
properly (as my native language has no articles): I mean _the_ 7-bit
encoding that is declared in the heading above.
- doesn't this just mean that the UTF8 octets have
been re-encoded to be 7-bit clean by some additional transport
protocol, not that the Unicode codepoints have been restricted to some
"lower 7 bits" subset (which would then be little more than ASCII)

No, Content-Transfer-Encoding: 7bit specifically means that data is
transferred as 7-bit units with no transfer encoding. Quoting the relevant
specification, RFC 2045, clause 6.2:

"The Content-Transfer-Encoding values "7bit", "8bit", and "binary" all mean
that the identity (i.e. NO) encoding transformation has been performed. As
such, they serve simply as indicators of the domain of the body data, and
provide useful information about the sort of encoding that might be needed
for transmission in a given transport system."

Thus, Content-Transfer-Encoding: 7bit promises that the octets are in the
range 0 to 127 (the ASCII range).

In UTF-8, such octets represent the same characters as they do in ASCII
(there's no question of "some" subset or being "little more"), except that
control characters are explicitly defined in ASCII, whereas Unicode only
designates those code positions as meaning control characters. Thus, the
Unicode characters that you can have in such data are exactly
U+0000..U+007F, i.e. the ASCII range.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top