Diacritical marks in HTML?

G

Girish Sharma

Is it possible to somehow encode diacritical marks such as a dot above and
below any letter, a tilde, a bar, or an accent above any letter? I didn't
find them in the ISO-8859-1 HTML entities.

I want to do it in a way that can be viewed on any browser.

Thanks.

Girish Sharma
 
M

mscir

Girish said:
Is it possible to somehow encode diacritical marks such as a dot above and
below any letter, a tilde, a bar, or an accent above any letter? I didn't
find them in the ISO-8859-1 HTML entities.

I want to do it in a way that can be viewed on any browser.

Thanks.
Girish Sharma

Maybe this approach will work for you: UTF-8

http://www.music.indiana.edu/tfm/diacrits.html
http://www.tony-franks.co.uk/UTF-8.htm
http://www.slovo.info/unifonts.htm
http://www.ioplex.com/~miallen/encdec/dl/tests/utf8.html
 
P

Philip Ronan

Girish said:
Is it possible to somehow encode diacritical marks such as a dot above and
below any letter, a tilde, a bar, or an accent above any letter? I didn't
find them in the ISO-8859-1 HTML entities.

I want to do it in a way that can be viewed on any browser.

Thanks.

Girish Sharma

ANY browser? I think that's going to be difficult.

If the characters aren't part of the Latin1 character set (iso-8859-1), you
might have better luck with Unicode (UTF-8).

If the characters you want aren't widely available, then you can use
"combining diacritical marks" to assemble them. I'm not sure how many
browsers support this, but here's a link anyway.
http://www.alanwood.net/unicode/combining_diacritical_marks.html

Phil
 
R

Richard

Philip said:
Girish Sharma wrote:
ANY browser? I think that's going to be difficult.
If the characters aren't part of the Latin1 character set (iso-8859-1),
you might have better luck with Unicode (UTF-8).
If the characters you want aren't widely available, then you can use
"combining diacritical marks" to assemble them. I'm not sure how many
browsers support this, but here's a link anyway.
http://www.alanwood.net/unicode/combining_diacritical_marks.html

But wouldn't the user have to his browser set to interpret that utf-8
coding?
Like a site using a certain font that is not in general use, he won't see
it.
 
M

mscir

Richard said:
But wouldn't the user have to his browser set to interpret that utf-8 coding?
Like a site using a certain font that is not in general use, he won't see it.

I thought that any browser that supported utf-8 would show the
characters correctly if the page included:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

Mike
 
J

Jukka K. Korpela

Girish Sharma said:
Is it possible to somehow encode diacritical marks such as a dot above
and below any letter, a tilde, a bar, or an accent above any letter? I
didn't find them in the ISO-8859-1 HTML entities.

Just a few of them belong to ISO-8859-1 (e.g., "a" with tilde).
I want to do it in a way that can be viewed on any browser.

Impossible.

See e.g. http://www.cs.tut.fi/~jkorpela/html/chars.var for a general
discussion. The most practical way might be to use a Unicode-capable editor
and author your documents in UTF-8. That way you would see the characters
themselves while working with a document. For casual occurrences of
diacritic marks, it might be simplest to write them using character
references.

To write e.g. letter "a" with dot above, you could use either the
precomposed character LATIN SMALL LETTER A WITH DOT ABOVE as
ȧ
or normal letter "a" followed by COMBINING DOT ABOVE:

Generally the former works more often, and qualitatively better, when
available - but only a relatively small number of such precomposed
characters exist in Unicode.

Browser support to _simple_ composition (base letter and one diacritic) is
tolerable in modern browsers (IE 6, Firefox, etc), but nasty surprises
should be expected for many combinations, especially if you try to put
several diacritics on a character.
 
L

Liz

In message <[email protected]>
Jukka K. Korpela said:
To write e.g. letter "a" with dot above, you could use either the
precomposed character LATIN SMALL LETTER A WITH DOT ABOVE as
ȧ

Can you explain why this is better than &aring; please?
(This isn't a challenge, it's genuine ignorance, plus I've got loads of
&aacute;s in my Galapagos pages)

Thanks

Slainte

Liz

--
 
J

Jukka K. Korpela

Liz said:
Can you explain why this is better than &aring; please?

If the text to be presented contains a with dot above, then a with dot
above is the correct character, and quite distinct from a with ring above.
Of course it was just an example - letter a with dot is pretty rare
(probably used only in Ulithian, which is spoken by 3,000 people, though it
could appear e.g. as a mathematical symbol, too).
(This isn't a challenge, it's genuine ignorance, plus I've got loads of
&aacute;s in my Galapagos pages)

No problem with that; the diacritic used in Spanish is the acute accent,
and &aacute; is one way of presenting letter a with acute.
 
L

Liz

In message <[email protected]>
Jukka K. Korpela said:
If the text to be presented contains a with dot above, then a with dot
above is the correct character, and quite distinct from a with ring above.
Of course it was just an example - letter a with dot is pretty rare
(probably used only in Ulithian, which is spoken by 3,000 people, though it
could appear e.g. as a mathematical symbol, too).
Aaah.
I was even more ignorant than I thought. :-(
No problem with that; the diacritic used in Spanish is the acute accent,
and &aacute; is one way of presenting letter a with acute.
Thank goodness. :)

Thanks and slainte

Liz
 
G

Girish Sharma

Thanks to all who replied to my request. I have tried a test using UTF-8 as
suggested, but commonly used Sanskrit transliteration diacritical marks did
not work well in either IE or Mozilla.

Girish Sharma
 
M

mscir

Girish said:
Thanks to all who replied to my request. I have tried a test using UTF-8 as
suggested, but commonly used Sanskrit transliteration diacritical marks did
not work well in either IE or Mozilla.

Would you post the url for the site? I want to learn more about this,
apparently using utf-8 is more complicated than I thought. I'm surprised
it's not more straight-forward to include different character sets in
web pages.

Mike
 
J

Jukka K. Korpela

mscir said:
Would you post the url for the site? I want to learn more about this,
apparently using utf-8 is more complicated than I thought. I'm
surprised it's not more straight-forward to include different character
sets in web pages.

How could things be more straightforward than including the character
itself in utf-8 encoding?

The encoding isn't really the issue. You can use any encoding, if desired,
and present the characters using character references, say ṣ for
letter s with dot below. There's of course the problem that the user's
browser might not have a suitable font, or might be unable to use it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,002
Messages
2,570,259
Members
46,858
Latest member
FlorrieTuf

Latest Threads

Top