eval and non-English characters conflict?

L

Logos

Yes, eval is a tool of the devil and I'll burn for using it. However,
in this instance it's quite handy and I'm quite lazy.

So, here's a weird one, and I'm wondering if anyone has a workaround.
I am pulling data off a server via AJAX, and some of that data has
non-English characters in it. The data is in the form of a json:

{"exitCode":1,"className":"clientRecords"
,"strAccountId":"100"
,"strName":"Dr. Gary A. Martin"
,"intReportsPurchased":100
,"intReportsUsed":13
,"intBalance":87
,"clientData":{
"r19":{
"className":"client"
,"intId":353
,"strFname":"NameOf"
,"strLname":"Someone"
,"strMname":""
}
,"r7":{
"className":"client"
,"intId":251
,"strFname":"BananaHead"
,"strLname":"Myrbø"
,"strMname":""
}
}

For testing, I was using eval("json =" +response.text). This works
fine for any records that DON'T have a non-English character in them.
However, when it has something like 'Myrbø' with the funny o on the
end, it dies and throws an error dialog saying "Expected '}'". Why
would it choke on the funny character? If I copy and paste the json
directly into the code, it works fine; it's just when it's embedded in
the eval that it goes tits up.

For extra strangeness, this does NOT happen on IE5.5, IE7, FF1.5.x or
FF1.0.x - just on IE 6 SP2!

Has anyone ever seen suchlike before???

Tyler
 
M

Martin Honnen

Logos said:
So, here's a weird one, and I'm wondering if anyone has a workaround.
I am pulling data off a server via AJAX, and some of that data has
non-English characters in it.
For extra strangeness, this does NOT happen on IE5.5, IE7, FF1.5.x or
FF1.0.x - just on IE 6 SP2!

How is the data encoded when you send it from the server? Try sending
UTF-8 encoded data. Although it is somehow strange that it works with IE
5.5 and IE 7 but not with IE 6.
 
L

Logos

How is the data encoded when you send it from the server? Try sending
UTF-8 encoded data. Although it is somehow strange that it works with IE
5.5 and IE 7 but not with IE 6.


It's straight ASCII from a file on the server. Unfortunately our
choice of back ends doesn't allow any other encoding. The truly weird
thing is is that I copy and paste it directly into my code, and it
works; copy and paste the exact same snippet into my code but inside an
eval, and it bombs. The odd 'o' character trips it up. Go ahead and
try it in IE, it's neat to watch.
 
D

Douglas Crockford

Logos said:
Yes, eval is a tool of the devil and I'll burn for using it. However,
in this instance it's quite handy and I'm quite lazy.

So, here's a weird one, and I'm wondering if anyone has a workaround.
I am pulling data off a server via AJAX, and some of that data has
non-English characters in it. The data is in the form of a json:

{"exitCode":1,"className":"clientRecords"
,"strAccountId":"100"
,"strName":"Dr. Gary A. Martin"
,"intReportsPurchased":100
,"intReportsUsed":13
,"intBalance":87
,"clientData":{
"r19":{
"className":"client"
,"intId":353
,"strFname":"NameOf"
,"strLname":"Someone"
,"strMname":""
}
,"r7":{
"className":"client"
,"intId":251
,"strFname":"BananaHead"
,"strLname":"Myrbø"
,"strMname":""
}
}
For testing, I was using eval("json =" +response.text). This works
fine for any records that DON'T have a non-English character in them.

This example shouldn't work fine. The braces are not balanced. I seems that you
are not using a proper JSON encoder.

I recommend that you use eval as minimally as possible.

var json = eval('(' + response.text ')');
However, when it has something like 'Myrbø' with the funny o on the
end, it dies and throws an error dialog saying "Expected '}'". Why
would it choke on the funny character? If I copy and paste the json
directly into the code, it works fine; it's just when it's embedded in
the eval that it goes tits up.

That should happen if you are transmitting invalid crap.
It's straight ASCII from a file on the server. Unfortunately our
choice of back ends doesn't allow any other encoding. The truly weird
thing is is that I copy and paste it directly into my code, and it
works; copy and paste the exact same snippet into my code but inside an
eval, and it bombs. The odd 'o' character trips it up.

Straight ASCII has no funny characters. Who knows what you are transmitting.

JSON requires Unicode, ideally UTF-8 (see RFC 4627
(http://www.ietf.org/rfc/rfc4627.txt?number=4627). You didn't specify why your
back end doesn't allow correct character encoding. I suspect it is due to either
laziness, as you explained in your introduction, or incompetence.
 
V

VK

Logos said:
The odd 'o' character trips it up. Go ahead and
try it in IE, it's neat to watch.

The "odd 'o' character" shall be "latin small letter o with stroke"
(ISO 8859-1 code 248, Unicode 00F8, HTML entity oslash)

Now I'll play a clairvoyant (which is rather easy though in your case):

Your page is in ISO 8859-1 encoding but (to make it "cooler looking")
you have placed charset=UTF-8 in the head section or your server
wrongly reports it as UTF-8.
Alas many authors and admins do it because they've been taught that
"Unicode is cool, anything else is lame" - and who wants to look lame?
:)

It is alas because UTF-8 is not the indication of the actual encoding
on the page: the page is in Unicode encoding. UTF-8 is *transport*
encoding: it tells to the recipient what algorithm was used to encode
multibyte Unicode sequences to deliver them over single-byte media
(which is HTTP packets). Having this information provided, parser can
decode properly the single byte stream back to Unicode characters.

Unicode characters 0-127 have the same code value as the corresponding
ASCII characters and they are not encoded in UTF-8. This way the parser
(even with wrong encoding indicated) has no problems to parse them. But
suddenly it meets your byte with value 248 (your oslash). As it was
instructed that the transport encoding is UTF-8, it presumes that this
byte is the beginning of a UTF-8 multibyte sequence denoting some
Unicode character. In attempt to read this sequence in full it
"swallows" one or two bytes immediately following the byte with 248
value. Whatever follows it: it will collapse. It can be curly bracket:
then you'll get "} is missing". It can be closing quote: then you'll
get "Unterminated string constant".

IE7 (and other modern UA's) doesn't chock on it because "bogus UTF-8"
became so common that UA producers had to add extra neuristic to the
parser to sort out real UTF-8 and "beautifying UTF-8". That was much
easier than to educate the web community what UTF-8 is, what Unicode is
and how do they relate to each other.

Here is your case:

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8">
<script>
var str = "{'foo':'barш'}";

eval('var foo=' + str);

alert(foo.foo);
</script>
</head>

<body>

</body>
</html>

simply change UTF-8 to ISO-8859-1 and it will come back to life.
 
V

VK

VK said:
Here is your case:

<html>
<head>
<title>Untitled Document</title>
<meta http-equiv="Content-Type"
content="text/html; charset=UTF-8">
<script>
var str = "{'foo':'barш'}";

eval('var foo=' + str);

alert(foo.foo);
</script>
</head>

<body>

</body>
</html>

simply change UTF-8 to ISO-8859-1 and it will come back to life.

You can make it work as it is on IE6 if you have additional Auto-Select
module installed and activated (View > Encoding > Auto-Select). It
implements the required neuristic which is built-in into IE7 by
default. It has other drawbacks though (search this group for "Korean
issue in IE" for a sample) so I wouldn't suggest to have it activated
permanently: at least until all Web developers will learn to always
indicate encoding for their pages and to match the declared encoding to
the actual one. The achievement of this "halliloya" should be placed to
some very far feature, I'm affraid :-( :)

P.S. To make what I said more "visual":
1) Imagine you see a string "wninfpevcg" and I tell you that this is
ROT13 encoding. You go to any ROT13 decoder and you get the string
"javascript". No problem.
2) Now imagine you would misinform you by telling that this is ROT47.
You go to a ROT47 decoder and you are getting an obscure sctring
"H?:?7A6G48"

The case 2) is your case, where encoded "javascript" string is your
web-page, bad guy telling a wrong encoding is yourselve (or your server
admin) and "H?:?7A6G48" instead of "javascript" is the result of not
telling the true.
 
L

Logos

Douglas said:
This example shouldn't work fine. The braces are not balanced. I seems that you
are not using a proper JSON encoder.

Twas a copy and paste error. The ending brace is there in the
transmission.
I recommend that you use eval as minimally as possible.

Truly, your wisdom is mighty. However, see disclaimer in original
post.
That should happen if you are transmitting invalid crap.

Really? Why only in IE6 and only when using an eval statement then, O
all knowing one? Though I know it's beneath you, let's address the
entire issue just for kicks, hmmm?
Straight ASCII has no funny characters. Who knows what you are transmitting.

Haven't looked at the hex, but it's stored as ASCII; it certainly
doesn't display correctly when examined on the back end, it shows as an
interesting star like character. As for what I am transmitting, I
posted it, tho copy & paste may garble the actual character codes.
JSON requires Unicode, ideally UTF-8 (see RFC 4627
(http://www.ietf.org/rfc/rfc4627.txt?number=4627). You didn't specify why your
back end doesn't allow correct character encoding. I suspect it is due to either
laziness, as you explained in your introduction, or incompetence.

No, it's because you're an asshole. It's filePro on SCO UNIX. As far
as I know filePro doesn't allow anything but ASCII, but since you are
obviously enormous talented in every direction you must also be a guru
in legacy 80s databases hosted on poorly doucmented crappy OSs and will
be sure to enlighten me further.

Or you could just be a total asshole, that's a really strong
possibility too.
 
L

Logos

Hm...I didn't specify any encoding actually. Me being lazy again! I
will try inserting your suggested code and seeing if it makes magic :)

Thanks for the suggestion!

Tyler
 
L

Logos

Jim said:
Try eval('"json =" +response.text') and see if that makes a difference.

I don't think that will work - won't that put a string on the left hand
side of an assignment statement, rather than a variable name?
 
L

Logos

Got it in one, Mister! You are my hero, and are awarded the official
title of Javascript GOD.

May I worship at your toesie-wosies? Please? <grin>

Tyler
 
V

VK

Got it in one, Mister! You are my hero, and are awarded the official
title of Javascript GOD.
May I worship at your toesie-wosies? Please? <grin>

Wow! :) Thank you for your thank you, as they say in England.

</OT>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,989
Messages
2,570,207
Members
46,783
Latest member
RickeyDort

Latest Threads

Top