which charset to use?

R

richard

The w3.org validator is having fits with special characters.
The page is currently using utf-8.
When it comes an e with the apostrophe over it, or some other character
with a secondary character over it, it fails to validate.
So which charset is proper for an xhtml transitional page for these
characters?

http://www.1littleworld.net/songs/Asongs.html
 
K

Kim André Akerø

The w3.org validator is having fits with special characters.
The page is currently using utf-8.
When it comes an e with the apostrophe over it, or some other character
with a secondary character over it, it fails to validate.
So which charset is proper for an xhtml transitional page for these
characters?

http://www.1littleworld.net/songs/Asongs.html

ISO-8859-1 might be a good choice. Alternately, replace all such special
characters with HTML entities (using é for é, for instance).

A list of entities can be found here:
http://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
or here:
http://www.w3schools.com/tags/ref_entities.asp
 
D

Doug Miller

The w3.org validator is having fits with special characters.
The page is currently using utf-8.
When it comes an e with the apostrophe over it, or some other character
with a secondary character over it, it fails to validate.

This is probably because you coded the character incorrectly in your HTML.

If you want, for example, an e with an accent grave (not "e with the apostrophe over it"), you
should write è or è instead of attempting to insert the actual hex representation
of the e-grave (0xE8) in your source document.
So which charset is proper for an xhtml transitional page for these
characters?

UTF-8, combined with valid HTML.
 
D

Doug Miller

explain then how come when I replaced the e' with the regular e,
the code now shows a ? in inverted colors?

Maybe *you* should explain why the validator is still complaining about an invalid character
that you *claim* you replaced.

Look at your file. IT'S STILL THERE. Or perhaps it's another one. Regardless, you still have
invalid UTF-8 characters in the document.
"e" is not a standard utf-8 character?

Of course it is. But è is NOT, and it's still in there, along with at least one other character that
*also* is not valid:

title="Raymond Lefèvre - Soul Coaxing (Ame Câline)"
 
D

Doug Miller

ROTFLOL, st00pid strikes again.

Evan, you're becoming a real pain in the ass. You NEVER contribute anything
useful to this newsgroup. The only posts you ever make are derogatory of
Richard. Yes, Richard is an ass -- but we don't need you to keep pointing it
out. We're all aware of it. Now just **** off.
 
J

Jukka K. Korpela

Table of valid HTML
characters can be found here:

http://www.lookuptables.com/

No, it is just a poorly presented table of entities for some characters,
with some disinformation that is not pragmatically fatal but plain wrong
anyway.

The set of valid characters in HTML consists of all ISO 10646 and
Unicode characters except for a handful of control characters, though
XHTML imposes some more limitations.
 
D

Doug Miller

Fix your silly dates first:

There's nothing the matter with his dates. This is purely a vanity site, apparently intended to be
viewed only by its author -- who is an American, and quite sensibly uses the date format most
commonly used in the United States.
 
D

dorayme


It lists some problems with different formats. It says optimistically
"Fortunately, there is one solution in the ISO-developed international
date format" and "The international format defined by ISO (ISO 8601)
tries to address all these problems by defining a numerical date
system as follows: YYYY-MM-DD".

However it does not fix things for people who do not happen to know
this definition. I like how it eggs a big fat pudding arguing:

"In most cases, writing the date in full letters would be better...
.... easy to understand for any English-speaking audience.

"But this system does not cross borders much better than its numerical
counterparts: does the french 12 Aout 2042 actually mean something for
a Japanese person? Or when you notice a e?oa44iN03ae16i? in Japanese
which is 16 March 1969 in English."

Notice the words "cross borders"? If a website is written in English
and the unambiguous long date form is used, order not being so
important, there are many borders it crosses just fine. In fact, the
date crosses the borders in at least as much comfort as the rest of
its fellow travelling text in the website, and it is just as
hospitably treated and understood. If '23 April 2012' is not
understood in Buginese, but the rest of the site is, then maybe a cat
can really smile without having a face.

It would be easier to teach a robot translator how to translate '7
April 2012' or 'April 7 2012' (or any arrangement that was unambiguous
to an English speaker who knew basic things about the meanings of the
words and the dating system of days, months, years) than to to teach
billions of people a standard.

It might well be true however, that using the ISO standard would make
the job of robot translators easier. But they need the least help! The
only help they need is unambiguity and the long-form English dates are
that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,816
Latest member
SapanaCarpetStudio

Latest Threads

Top