Encodings of javascript

T

Taras_96

Hi everyone,

I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for(var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I
write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4? Is it transcoded by the browser into javascript's native UTF-16
before giving it to the javascript function?

Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø';
}

And the HTML page which is the target for insertion itslef is encoded
in UTF-8:

<script charset="UCS-4" src="theSource.js"></script> <!-- telling the
browser that the JS is encoded in UCS-4 -->
<body>
<div id='bar'>
</div>
</body>

How does the string get inserted into the HTML with UTF-8 encoding? Is
it again something the browser does (ie, it does something along the
lines of convert the UCS-4 string into javascript's native UTF-16, and
then when writing it into the DOM converts it into UTF-8)?

Thanks

Taras
 
T

Thomas 'PointedEars' Lahn

Taras_96 said:
I have a couple of questions regarding encodings of javascript.

Say I have a function getCodePoints in foo.js:
-----------------------------------------------------------
function getCodePoints(str)
{
for (var i = 0; i < str.length; i++)
{
alert(str.charCodeAt(i));
}
}
-----------------------------------------------------------

In my html, which is saved in *UCS-4 (4 bytes per character)*, I write:

<body onload="getCodePoints('hello');">

How does javascript know that the string it is given is encoded using
UCS-4?

It does not. It would appear the user agent's parser would pass the content
of the `onload' attribute, after recoding, as the body of a function to the
script engine, which would be called as the property of the object that
caused the creation of the event (with some exceptions), on event.
Is it transcoded by the browser into javascript's native UTF-16 before
giving it to the javascript function?

In a sense.
Equivalently, say we have the following function in a javascript file
which itself is encoded in UCS-4:

function insertString()
{
window.document.getElementById('bar').innerHTML = '¼Ø'; ^^
}

What character should we observe there?
And the HTML page

There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout). You are referring to HTML *documents*.
which is the target for insertion itslef is encoded in UTF-8:

<script charset="UCS-4" src="theSource.js"></script> <!-- telling the
browser that the JS is encoded in UCS-4 -->

Iff supported. You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
<body> <div id='bar'> </div> </body>

How does the string get inserted into the HTML with UTF-8 encoding?

It does not. The DOM API uses the DOMString type, which must be UTF-16(LE)
encoded. Although `innerHTML' is a proprietary property, I would presume
that it has a setter that accepts a DOMString value and performs recoding
according to the detected character encoding of the document if required.

<http://www.w3.org/TR/DOM-Level-2-Core/core.html#ID-C74D1578>
<http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-C74D1578>
Is it again something the browser does

The DOM implementation is a part of the layout engine which is a part of the
browser.
(ie, it does something along the lines of convert the UCS-4 string into
javascript's native UTF-16, and then when writing it into the DOM
converts it into UTF-8)?

Probably, yes. If the browser is Open Source, you can UTSL to confirm that.


HTH

PointedEars
 
T

Taras_96

Hi Thomas,

Sounds like I was on the right track

Taras_96 wrote: .....


                                                       ^^


What character should we observe there?

It doesn't really matter.. the character was supposed to be a 1/4
character followed by a phi, I just wanted some non ASCII characters.
There are no "HTML pages" other than in a page-wise output medium (such as a
presentation or a printout).  You are referring to HTML *documents*.

Fair enough :)
Iff supported.  You should provide a proper Content-Type header for the
external resource to make sure it is parsed correctly.
True



Probably, yes.  If the browser is Open Source, you can UTSL to confirm that.

HTH

Helped a lot... thanks!

Taras
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,968
Messages
2,570,153
Members
46,701
Latest member
XavierQ83

Latest Threads

Top