Trouble with document.write and UTF-8

S

stevelooking41

Can someone explain why I don't seem unable to use document.write to
produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?

I've tried everyway I've been able to find to tell the browser I'm
trying to print UTF-8 and still no luck. I'd like the first 2 tries to
match the second two tries as far as output.

<HTML>
<meta http-equiv="Content-Type" content="application/x-script;
charset=UTF-8">
<SCRIPT language="javascript" charset="UTF-8">
var out = "UTF-8 nbsp:\xC2\xA0:Unicode:\uC2A0:Unicode:\u00A0:HTML
nbsp:&nbsp;"
document.open("text/html; charset=UTF-8");
document.write(out);
var i =0;
while (i <out.length){
document.write("<br>"+i+" "+out.charAt(i)+" "+out.charCodeAt(i));
i++;
}
document.close();document.charset="UTF-8";
</SCRIPT>
</HTML>

The output looks like this:
UTF-8 nbsp:Â :Unicode:슠:Unicode: :HTML nbsp:
0 U 85
1 T 84
2 F 70
3 - 45
4 8 56
5 32
6 n 110
7 b 98
8 s 115
9 p 112
10 : 58
11 Â 194
12 160
13 : 58
14 U 85
15 n 110
16 i 105
17 c 99
18 o 111
19 d 100
20 e 101
21 : 58
22 슠 49824
23 : 58
24 U 85
25 n 110
26 i 105
27 c 99
28 o 111
29 d 100
30 e 101
31 : 58
32 160
33 : 58
34 H 72
35 T 84
36 M 77
37 L 76
38 32
39 n 110
40 b 98
41 s 115
42 p 112
43 : 58
44 & 38
45 n 110
46 b 98
47 s 115
48 p 112
49 ; 59

Thanks!
 
T

Thomas 'PointedEars' Lahn

Can someone explain why I don't seem unable to use document.write to
produce a valid UTF-8 none breaking space sequence (Hex: C2A0) ?

The Unicode 4.1 character at code point 0xC2A0 is an (unnamed) Hangul
syllable said:
I've tried everyway I've been able to find to tell the browser I'm
trying to print UTF-8 and still no luck. I'd like the first 2 tries to
match the second two tries as far as output.

<HTML>
<meta http-equiv="Content-Type" content="application/x-script;
charset=UTF-8">

Pardon? This is supposed to be an HTML document, is it not? So the basic
Content-Type should be text/html. And if that HTML document were UTF-8
encoded, you would not have to escape Unicode anyway. So you want to
change the `charset' parameter to ISO-8859-1 and the like, definitely
no UTF encoding.

And there is no known MIME-like label as 'application/x-script'.
I wonder how you got the idea.

You probably meant

<meta http-equiv="Content-Script-Type"
content="application/javascript; charset=UTF-8">

as described in the Informal RFC "Scripting Media Types", which is,
however, not yet used by user agents.
<SCRIPT language="javascript" charset="UTF-8">

The `language' attribute is deprecated in HTML4, the `type' attribute
is #REQUIRED. The `charset' attribute is for linked resources, i.e.
useful only in combination with the `src' attribute.

<script type="application/javascript">

See <http://www.w3.org/TR/html4/interact/scripts.html#edef-SCRIPT>
and said:
var out = "UTF-8 nbsp:\xC2\xA0:Unicode:\uC2A0:Unicode:\u00A0:HTML
nbsp:&nbsp;"

You need to understand what UTF and Unicode are and how UTF works,
see said:
document.open("text/html; charset=UTF-8");

There is no specified argument for the HTMLDocument::eek:pen() method.
Therefore, Mozilla/5.0 based user agents will ignore it if you provide
one.

document.charset="UTF-8";

There is no document.charset property, hence you are creating one here.
The output looks like this:
[...]

Works as designed.

Summary: You should definitely drink more tea[tm] when coding.


PointedEars
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,149
Members
46,695
Latest member
StanleyDri

Latest Threads

Top