!doctype & foreign languages

Y

yes=no

Hi,


I'm translating several of my site's pages into french.

i have so far added this line to the metatags:

<META HTTP-EQUIV="Content-Language" Content="fr">

but I'm not sure if any other additions are necessary. my main aim here is
to make the page accessible to search engines that index for the french
language.

I notice that the !DOCTYPE declaration generated by Dreamweaver is this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

now, does the "EN" at the end signify the "english" language, and should I
change this to "FR" for my french pages?

Also (if there are Canucks listening...) does the Quebecois french demand
any different kind of tagging or does "fr" indicate a universal french,
irrespective of different "dialects" of french?

thanks for any comments..

Y?N
 
E

Eric Bohlman

I notice that the !DOCTYPE declaration generated by Dreamweaver is
this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

now, does the "EN" at the end signify the "english" language, and
should I change this to "FR" for my french pages?

No. The "EN" indicates that the human-readable comments in the actual DTD
document are written in English.
 
M

Marc Nadeau

yes=no a écrit:
Hi,


I'm translating several of my site's pages into french.

i have so far added this line to the metatags:

<META HTTP-EQUIV="Content-Language" Content="fr">

but I'm not sure if any other additions are necessary. my main aim here
is to make the page accessible to search engines that index for the french
language.

I notice that the !DOCTYPE declaration generated by Dreamweaver is this:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

now, does the "EN" at the end signify the "english" language, and should I
change this to "FR" for my french pages?

Also (if there are Canucks listening...) does the Quebecois french demand
any different kind of tagging or does "fr" indicate a universal french,
irrespective of different "dialects" of french?

thanks for any comments..

Y?N

I am a 'québécois' and use fr.

We do not use a dialect; the rest of the world does. ;-)
 
Y

yes=no

I am a 'québécois' and use fr.

We do not use a dialect; the rest of the world does. ;-)

Damn! It's that "attitude" thing again! What happens when the Chinese take
over and put up signs in Chinese bigger than Francoise? Whose "distinct
society" will you belong to in prison reciting "ho chi minh" mantras? :)
 
T

Toby A Inkster

yes=no said:
i have so far added this line to the metatags:
<META HTTP-EQUIV="Content-Language" Content="fr">
but I'm not sure if any other additions are necessary.

Also:

<html lang="fr">

And if you have any control over your HTTP Headers:

Content-Language: fr

As Eric indicated, this should remain unchanged:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
 
F

Felix Atagong

Toby A Inkster said:
<html lang="fr">
Content-Language: fr
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

Google seems to recognise what language a page is in, even without language
(meta)tags, and this regardless of the URL's last letters (after the dot).
I'm aware of a .org organistation with pages in Dutch and yep, Google puts
them under the Dutch search (as language) and under a Belgian search (as
country).

How do they do that? Check where the server it is on is located? Just
wondering... (I should also say that Google also makes a lot of mistakes by
putting 'English' pages on a Dutch search).
 
S

Safalra

yes=no said:
Also (if there are Canucks listening...) does the Quebecois french demand
any different kind of tagging or does "fr" indicate a universal french,
irrespective of different "dialects" of french?

You can use either 'fr' or 'fr-ca'. Also, change your <html> tag to
<html lang="fr">. And technically if you link to pages in a different
language from the document's, you should indicate this with the
hreflang property - for example:

<a href="http://www.safalra.com/" hreflang="en-gb">Safalra's
Website</a>

--- Stephen Morley ---
http://www.safalra.com
 
O

Owen Jacobson

Felix said:
Google seems to recognise what language a page is in, even without
language (meta)tags...

How do they do that?

A well-configured web server sends a Content-Language: header with each
page to indicate what human-readable language the page is in. This is
the appropriate place for this information.
 
S

Safalra

Felix Atagong said:
Google seems to recognise what language a page is in, even without language
(meta)tags, and this regardless of the URL's last letters (after the dot).
I'm aware of a .org organistation with pages in Dutch and yep, Google puts
them under the Dutch search (as language) and under a Belgian search (as
country).

How do they do that? Check where the server it is on is located? Just
wondering... (I should also say that Google also makes a lot of mistakes by
putting 'English' pages on a Dutch search).

It just looks the words on the pages up in dictionaries for various
languages, and judges the page to be in the language where the most
words existed. (Sorry about the awful grammar in that last
sentence...)

--- Stephen Morley ---
http://www.safalra.com
 
J

Jukka K. Korpela

Owen Jacobson said:
A well-configured web server sends a Content-Language: header with
each page to indicate what human-readable language the page is in.
This is the appropriate place for this information.

What makes you think so? I don't know any recommendation about using
Content-Language. And how would a server know the language? What the
HTTP protocol says about the use of this header is this:
"The primary purpose of Content-Language is to allow a user to identify
and differentiate entities according to the user's own preferred
language."
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.12

It's basically useful for negotiated content, and in fact negotiation
usually takes place on the server, so that the Content-Type header, if
present, has no impact on the client.

The HTML specifications define the lang attribute (and xml:lang in
XHTML) for indicating natural language of content. A browser might use
a Content-Language header as the default, when the root element
(<html>) lacks such an attribute. But it is _not_ common for browsers
to do so (I don't think any browser does such things), it is _not_
common to configure server to send such headers and there is hardly any
reason to do so. And I would be very surprised if Google would do
anything with them.
 
M

Marc Nadeau

yes=no a écrit:
Damn! It's that "attitude" thing again! What happens when the Chinese
take
over and put up signs in Chinese bigger than Francoise? Whose "distinct
society" will you belong to in prison reciting "ho chi minh" mantras? :)

This chinese 24 bits glyph: ;-)
is pronounced 'wink' in many dialects and means that the author is doing a
joke.
 
O

Owen Jacobson

Jukka said:
The HTML specifications define the lang attribute (and xml:lang in
XHTML) for indicating natural language of content. A browser might
use a Content-Language header as the default, when the root element
(<html>) lacks such an attribute. But it is not common for browsers
to do so (I don't think any browser does such things), it is not
common to configure server to send such headers and there is hardly
any reason to do so. And I would be very surprised if Google would do
anything with them.

Interesting point, I hadn't considered that. Assuming you're right and
I'm wrong (a fairly safe assumption here) then I ask, why *not* use the
Content-Language header? Most useful metadata about the document is
sent with the headers, not the response body; further, the
Content-Language header may allow caches to do language negotiation
locally rather than passing the request on to the actual server.
 
J

Jukka K. Korpela

Owen Jacobson said:
- - why *not* use the Content-Language header?

If the information in the header is correct and no client gets the
meaning of the header wrong, then there is no harm in including it. But
I'm not so sure about those ifs, especially the former, and even the
latter is uncertain.

How would you make a server send those headers? Suppose all your
HTML documents are in English and you make the server send
Content-Language: en
for them. Fine. But will you remember to do something if you add a
document in another language? And how would you do that? You would
probably need some special mechanism, perhaps changing the filename
extension, potentially causing problems. (I still remember the .htm8
incident: Google didn't index URLs ending with .htm8 at all, and
although this was fixed, I'm a bit suspicious about the effect of
creative suffixes.)

Besides, if your documents are in British English, you could specify
Content-Language: en-UK
which is more informative. But I'm afraid that _if_ some software uses
Content-Language headers for something, it could play with simplistic
and incorrect rules and accept a primary language code only, perhaps
treating en-UK as unrecognized language. Besides, the semantics of the
header is not clear. In fact I think you should not use a subcode in
Content-Language unless you really think that the document is
unintelligible to people who do not understand that particular form of
the language. As you see, it's easy to get confused with the language
codes.
Most useful metadata about the document is
sent with the headers, not the response body;

It is true that headers could be useful e.g. in avoiding useless
fetching of data. In principle, a browser could inform the user that
the user is about to follow a link to resource that is 42 megabytes of
text in a dialect of Finnish - after the browser has sent a HEAD
request and analyzed the response, and before sending a GET.
But I don't think any browser even tries to do such things.
further, the
Content-Language header may allow caches to do language negotiation
locally rather than passing the request on to the actual server.

No, I don't think so. I'm having hard time in trying to imagine how
that could work even in principle.
 
T

Toby A Inkster

Jukka said:
How would you make a server send those headers? Suppose all your
HTML documents are in English and you make the server send
Content-Language: en
for them. Fine. But will you remember to do something if you add a
document in another language? And how would you do that?

The conventional way would be to use:

about_us.html.en (in English)
about_us.html.de (auf Deutsch)

-or-

about_us.en.html (in English)
about_us.de.html (auf Deutsch)

Apache is configured to recognise these forms out of the box.
You would
probably need some special mechanism, perhaps changing the filename
extension, potentially causing problems. (I still remember the .htm8
incident: Google didn't index URLs ending with .htm8 at all, and
although this was fixed, I'm a bit suspicious about the effect of
creative suffixes.)

With Apache MultiViews, suffixes aren't needed at all.
Besides, if your documents are in British English, you could specify
Content-Language: en-UK
which is more informative.

Which would be wrong. Try en-GB.
But I'm afraid that _if_ some software uses
Content-Language headers for something, it could play with simplistic
and incorrect rules and accept a primary language code only, perhaps
treating en-UK as unrecognized language.

Well, it would be an unrecognised language. The variety of English common
in the Ukraine?

Besides which, Apache's content negotiation is smart enough to deal with
this. It will happily serve en-GB documents to clients that request
documents with an HTTP Accept-Language of just "en".

Other servers may be less smart.
No, I don't think so. I'm having hard time in trying to imagine how
that could work even in principle.

I think language-based content negotitation is a disaster right now, until
more people learn to set their Accept-Language header.

But there is no harm in specifying the Content-Language of a non-negotiated
document.
 
J

Jukka K. Korpela

Toby A Inkster said:
The conventional way would be to use:

about_us.html.en (in English)
about_us.html.de (auf Deutsch)

-or-

about_us.en.html (in English)
about_us.de.html (auf Deutsch)

That's a conventional way, and it involves a complication of URLs (and
filenames). Mostly this won't cause harm, but it might make people
wonder, especially since such URLs aren't _that_ common.
Which would be wrong. Try en-GB.

Indeed. As you see, it's very easy to get confused with language codes.
The code en-GB appears in quite a many documents, including Dublin Core
specifications (where it appears as an example). I knew it's wrong, but
to be honest, this time I forgot (so I won't pretend this was an
intentional demonstration of the confusion).
Well, it would be an unrecognised language. The variety of English
common in the Ukraine?

No, Ukraine is UA.
Besides which, Apache's content negotiation is smart enough to deal
with this. It will happily serve en-GB documents to clients that
request documents with an HTTP Accept-Language of just "en".

That's good behaviour, but problems arise in the common situation where
the client specifies en-UK only (and, as you know, this might be caused
just by the browser's factory defaults, which do _not_ reflect the
user's language abilities).
I think language-based content negotitation is a disaster right
now, until more people learn to set their Accept-Language header.

Not a disaster, just a bit frustrating. You need to make sure that
whatever the Accept-Language says, the user gets a page where he can
find a version he prefers.
But there is no harm in specifying the Content-Language of a
non-negotiated document.

That is correct, if the information there is correct and if no software
makes a mistake. :) But what are the possible _benefits_? Rather
theoretical, I would say.
 
T

Toby A Inkster

Jukka said:
That's a conventional way, and it involves a complication of URLs (and
filenames). Mostly this won't cause harm, but it might make people
wonder, especially since such URLs aren't _that_ common.

I didn't say that you should neccessarily use those as URLs. Just file
names.

A practical solution to do might be to have the files:

/var/www/html/today.html.en
/var/www/html/heute.html.de

and then link to them via:

<a href="/today" lang="en">
<a href="/heute" lang="de">

So that the ".html" and ".en"/".de" suffixes aren't used as part of the
URLs, and aren't used in content negotiation -- just to tell the server
which HTTP headers to send.
No, Ukraine is UA.

".ua" is the Ukranian top-level domain name, but "uk" is the ISO639-2
code.
 
J

Jukka K. Korpela

Toby A Inkster said:
".ua" is the Ukranian top-level domain name, but "uk" is the ISO639-2
code.

This might get too off-topic even for alt.html, but:

ISO 639-2 defines language codes only. The Ukrainian _language_ has
code "uk" there (and the three-letter code "ukr"). But you cannot use a
language code as a subcode; "en-UK" is undefined (and therefore
incorrect).

The authority on _country codes_ says that the country code for Ukraine
is "UA":
<http://www.iso.org/iso/en/prods-services/iso3166ma/
02iso-3166-code-lists/list-en1.html>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top