prob's w foreign char sets ...

Z

Zigzag

hi,

I'm attempting to translate some web content into other languages
and need some help understanding the proper coding to have these
characters show up consistently, cross-browser & cross platform.

I'm using HTML 4.01 transitional for these documents, and for the
foreign character sets (ie, chinese, korean, japanese etc), I'm using
the unicode numeric reference example: ...

I normally state the page language in the HTML tag (ie <html lang="ko">)
also, using the meta tag:
<meta http-equiv="Content-Language" content="ko">

The web pages all seem to show up well on a new Mac computer,
but on an older PC laptop, none of the Korean, Chinese or Japanese
shows up properly. (firefox renders little squares with a stack of
numbers & letters in them; opera renders plain squares).

Are there any consistent 'rules' to follow for displaying
and rendering unicode character sets properly?

thanks for any pointers.

ZZ
 
Z

Zigzag

ps - wondering what the proper character set declaration
should be, ie, UTF-8, iso-8859-1, etc ...

thanks
 
J

Jukka K. Korpela

[...] for the
foreign character sets (ie, chinese, korean, japanese etc), I'm using
the unicode numeric reference example: ...

That's possible and works independently of the character encoding of the
HTML document, but it really makes HTML source hard to read. It's
comparable to writing "hello" as "hello" (which
is possible).
I normally state the page language in the HTML tag (ie <html lang="ko">)
also, using the meta tag:
<meta http-equiv="Content-Language" content="ko">

Neither of these has much effect, but the former can be regarded as good
practice in principle (the latter is then redundant). Beware that it may
change the default font used by the browser, as this can be
language-dependent. Any setting of the document's overall font family,
say body { font-family: Gulim, Malgun Gothic }, will override that,
though, whenever it lists any specific font that is available in the
user's system.

If you open the Settings in Firefox and select Contents, there's a
button for "additional settings" for fonts (I'm using a Finnish version
now, so I don't know what the exact English terms are here), you can see
(and modify) the settings for various character repertoires, like
Korean, setting the default serif font, default sans-serif font, and
default monospace font. The factory defaults for these defaults are
probably reasonable, though perhaps not optimal.

The morale is: If you use lang markup, you should expect variation of
fonts across browsers, and possibly fonts other than those that would be
used without the lang markup. But you can largely remove this variation
by explicitly specifying a list of fonts, in order of preference.
The web pages all seem to show up well on a new Mac computer,
but on an older PC laptop, none of the Korean, Chinese or Japanese
shows up properly. (firefox renders little squares with a stack of
numbers& letters in them; opera renders plain squares).

This is most probably due to lack of suitable fonts on the PC, but it is
also possible that the browser just can't find a relevant font and needs
a little help. Testing on different browsers browsers may reveal this.
You can also try with the following style sheet:

* { font-family: Batang }

This should work on Windows XP and later.

More info: Guide to using special characters in HTML,
http://www.cs.tut.fi/~jkorpela/html/characters.html

There's a specific issue with Korean: you can write a hangul syllable
using one syllabic character, or as decomposed, using several
characters. The choice between these representations isn't supposed to
affect the rendering, but in reality it may, due to font limitations and
program limitations.
 
M

mayeul.marguet

ps - wondering what the proper character set declaration
should be, ie, UTF-8, iso-8859-1, etc ...

thanks

If you insist on using character references for everything non-US, then
it doesn't matter. Declare whatever you want and use it.

Otherwise, you need to use and declare UTF-8, as iso-8859-1 just can't
write korean.
 
J

Jukka K. Korpela

2012-05-21 13:22 said:
If you insist on using character references for everything non-US, then
it doesn't matter. Declare whatever you want and use it.

In principle, yes. And e.g. Ascii, UTF-8, iso-8859-1, windows-1252 are
just the same when there are no characters outside the Ascii range in
the data.
Otherwise, you need to use and declare UTF-8, as iso-8859-1 just can't
write korean.

UTF-8 is probably the best option, but there _are_ several encodings
specifically designed for Korean. But they are usually not a good choice
for web pages. For example, my IE 9 has only one Korean encoding in its
menu for encoding selection, and it is labelled "korealainen" (=
Korean), leaving it to the user to guess which of the Korean encodings
it is...

Besides, if you later find out that you need characters outside the set
supported by a Korean encoding you've selected, you'll face the problem
again: either use clumsy character references, or switch to UTF-8 (which
may be non-trivial after you've created a large site in another encoding).
 
N

Neil Gould

Zigzag said:
I'm attempting to translate some web content into other languages
and need some help understanding the proper coding to have these
characters show up consistently, cross-browser & cross platform.
[...]

The web pages all seem to show up well on a new Mac computer,
but on an older PC laptop, none of the Korean, Chinese or Japanese
shows up properly. (firefox renders little squares with a stack of
numbers & letters in them; opera renders plain squares).
Without the appropriate language fonts installed, you will see those
squares, so make sure that your PC has the correct fontset installed, and it
may render them correctly without changes to the HTML character set.
 
J

Jukka K. Korpela

Without the appropriate language fonts installed, you will see those
squares, so make sure that your PC has the correct fontset installed, and it
may render them correctly without changes to the HTML character set.

Little does that help to make the page display properly on anyone else’s
computer. And installing a “fontset†(whatever that means) does *not*
ensure that all browsers will use it automatically.
 
N

Neil Gould

Jukka said:
Little does that help to make the page display properly on anyone
else’s computer. And installing a “fontset†(whatever that
means) does *not* ensure that all browsers will use it automatically.
You snipped the context of my remark:
">
The web pages all seem to show up well on a new Mac computer,
but on an older PC laptop, none of the Korean, Chinese or Japanese
shows up properly.
I was addressing one reason that Korean, Chinese and Japanese fonts would
not display correctly on *the OP's* older PC. Those fonts are not installed
by default on some older PCs, and will not display regardless of the HTML
character set specified. I was not addressing the issue as it applies to
"anyone else's computer", nor whether all browsers would display installed
fonts correctly.
 
A

Andreas Prilop

If you insist on using character references for everything non-US,
then it doesn't matter.

It does matter. If you choose ISO-8859-6 or Windows-1256, then
Internet Explorer's default typeface will be Simplified Arabic.
Even in Windows 7, Simplified Arabic does not contain Urdu letters
although Urdu letters were added to code page 1256 with Windows 2000.
Internet Explorer will take the Urdu letters from some other font,
thereby failing to join the letters.
http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/att-0110/join.html
http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/0110.html
 
M

mayeul.marguet

It does matter. If you choose ISO-8859-6 or Windows-1256, then
Internet Explorer's default typeface will be Simplified Arabic.
Even in Windows 7, Simplified Arabic does not contain Urdu letters
although Urdu letters were added to code page 1256 with Windows 2000.
Internet Explorer will take the Urdu letters from some other font,
thereby failing to join the letters.
http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/att-0110/join.html
http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/0110.html

I was reading the question more as an either-or between utf-8 or
windows-1252, but fine observation, thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top