about charset again

R

Ricardo Garcia

hi, i put as you said encoding="UTF-8" (<?xml version="1.0" encoding="UTF-8"
standalone="no"?> as first line), but when i tried to validate in
http://validator.w3.org, i get the following response:


The HTTP Content-Type header sent by your web browser (unknown) did not
contain a "charset" parameter, but the Content-Type was one of the XML
text/* sub-types (text/xml). The relevant specification (RFC 3023) specifies
a strong default of "us-ascii" for such documents so we will use this value
regardless of any encoding you may have indicated elsewhere. If you would
like to use a different encoding, you should arrange to have your browser
send this new encoding information.

Sorry, I am unable to validate this document because on lines 47, 51, 54,
102, 112, 148, 157, 231 it contained one or more bytes that I cannot
interpret as us-ascii (in other words, the bytes found are not valid values
in the specified Character Encoding). Please check both the content of the
file and the character encoding indication.


in that lines were the caracters ñ (&ntilde;) é (&ecute;) and many others

do you know how to solve this problem???


Thanks
 
A

Alan J. Flavell

hi, i put as you said encoding="UTF-8" (<?xml version="1.0" encoding="UTF-8"
standalone="no"?> as first line), but when i tried to validate in
http://validator.w3.org, i get the following response:

You don't say which mode of submission you are using, but
from the wording of the response I presume you are validating by
file upload?
The HTTP Content-Type header sent by your web browser (unknown) did not
contain a "charset" parameter, but the Content-Type was one of the XML
text/* sub-types (text/xml). The relevant specification (RFC 3023) specifies
a strong default of "us-ascii" for such documents so we will use this value
regardless of any encoding you may have indicated elsewhere. If you would
like to use a different encoding, you should arrange to have your browser
send this new encoding information.

That sounds pretty self-explanatory to me, but you need to understand
a little more about how your browser works to get this sorted out.
in that lines were the caracters ñ (&ntilde;) é (&ecute;) and many
others

If your document is in utf-8 as your <?xml thingy says it is, then
those characters will be represented in utf-8 encoding and thus
will consist of (in this case) two bytes each, with their top bits
set. These cannot be us-ascii characters, therefore, and the
validator is rejecting them with the above explanation.

Please understand that the encoded characters (ñ and é) represent a
problem in the above terms; whereas their representation in
&-notation would consist entirely of us-ascii characters and thus
would not be a problem from this point of view (your &ntilde; and
&eacute; would be a problem in XML for a different reason, namely that
you would need to define them). So it's important to be precise in
describing what you are doing.
do you know how to solve this problem???

I would guess one or other of:

1. find out how to have your browser send a charset= attribute on file
upload

2. find out how to have your browser upload with an application/...
content type, where utf-8 is assumed default (if I'm not mistaken)

3. Use the "extended file upload" interface, and specify the encoding
in the submission dialog

4. put your content on a web server and validate it by URL; adjust
the web server until it sends the right Content-type header.

Details depend on what specific software you are using. I guess the
option number 3 above is the easiest to use.

good luck
 
H

Henry S. Thompson

Looks to me like you have iso-8859-1 characters in your document, if
you just cut-and-paste to get e.g. ñ in your message. Set your
encoding to iso-8859-1 and see if that helps.

ht
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
Half-time member of W3C Team
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: (e-mail address removed)
URL: http://www.ltg.ed.ac.uk/~ht/
[mail really from me _always_ has this .sig -- mail without it is forged spam]
 
A

Andreas Prilop

X-Newsreader: Microsoft Outlook Express 6.00.2800.1437

in that lines were the caracters ? (&ntilde;) ? (&ecute;) and many others

do you know how to solve this problem???

You can't even transmit special, non-ASCII characters in your
would-be newsreader. So, what do you expect?

Hint:

Tools > Options > Send
Mail Sending Format > Plain Text Settings > Message format MIME
News Sending Format > Plain Text Settings > Message format MIME
Encode text using: None
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,831
Latest member
RusselWill

Latest Threads

Top