How do I evaluate a JSON response?

T

Thomas 'PointedEars' Lahn

Lasse said:
Silly me, answering before checking.
It actually does allow "\"" as valid JSON.

But it does so in a bogus way, by removing all escape sequences
before testing the string literal, without considering the context.


PointedEars
 
L

Lasse Reichstein Nielsen

Thomas 'PointedEars' Lahn said:
But it does so in a bogus way, by removing all escape sequences
before testing the string literal, without considering the context.

It doesn't remove them. It replaces them with a "@", which isn't part
of any valid token outside of a string literal. So, after replacing
all valid escape sequences by "@", if what remains is valid JSON, so
was the original - and there are no longer escapes in the string
literals, so you don't have to worry about matching the wrong
double quote.

/L
 
A

Asen Bozhilov

Garrett said:
A parser would be too much for the FAQ.

If you implemented full parser for JSON that definitely is not
necessary for FAQ. You can use something like this:

if (!this.JSON) {
this.JSON = {
parse : (function () {
var JSON_GRAMMAR = {
STRING : '"([^"\\\\\\x00-\\x1F]|\\\\["\\\\\/bfnrt]|\\\
\u[0-9A-Fa-f]{4})*"',
NUMBER : '-?\\d+(?:\\.\\d+)?(?:[Ee][+-]?\\d+)?',
BOOLEAN : 'true|false',
NULL : 'null'
};
var REG_EXP = {
property : new RegExp(JSON_GRAMMAR.STRING + '\\s*:', "g"),
value : new RegExp(JSON_GRAMMAR.STRING + '|' +
JSON_GRAMMAR.NUMBER + '|' + JSON_GRAMMAR.BOOLEAN + '|' +
JSON_GRAMMAR.NULL, "g"),
invalidTokens : /[^{}\[\],\s]/ ,
};
return function(jsonStr) {
var output;
if
(REG_EXP.invalidTokens.test(jsonStr.replace(REG_EXP.property,
'').replace(REG_EXP.value, ''))) {
throw new SyntaxError('JSON.parse');
}
output = new Function('return Array(' + jsonStr + ');')();
/**
* Here you can check:
* if length property of output is greater that 1
* OR
* output[0] instanceof Array
* throw `JSON.parse` error
*/
return output[0];
};
})()
};
}

For string regular expression I am using Thomas Lahn approach.
 
A

Asen Bozhilov

Asen said:
        output = new Function('return Array(' + jsonStr + ');')();

Correction, should be:

output = new Function('return Array(null, ' + jsonStr + ');')();
/**
* Here you can check:
* if length property of output is greater that 2
* OR
* output[1] instanceof Array
* throw `JSON.parse` error
*/
return output[1];
 
G

Garrett Smith

Correction, should be:

output = new Function('return Array(null, ' + jsonStr + ');')();

Why `null` as first element?
/**
* Here you can check:
* if length property of output is greater that 2
* OR
* output[1] instanceof Array
* throw `JSON.parse` error
*/
return output[1];

That strategy you've employed is one I considered; though there are a
couple of issues with some of the patterns.

I've not finished my reply to Thomas; it's been in draft for about three
days.

Garrett
 
G

Garrett Smith

Garrett said:
A parser would be too much for the FAQ.

If you implemented full parser for JSON that definitely is not
necessary for FAQ. You can use something like this:

if (!this.JSON) {
this.JSON = {
parse : (function () {
var JSON_GRAMMAR = {
STRING : '"([^"\\\\\\x00-\\x1F]|\\\\["\\\\\/bfnrt]|\\\
\u[0-9A-Fa-f]{4})*"',

If you use a regexp, you don't have to worry about escaping backslashes
in the pattern. As a strategy (not real):

var STRING_BACKSLASH = /[\\]/,
NUMBER_INT = /\d+/;

var fooIntExp = new RegExp(
STRING_BACKSLASH.source
+ "|"+ NUMBER_INT.source
);

The downside is that requires extra regexp object creation. It seems a
little easier to read without the extra backslash escapes.

[...]

The reply to Thomas' msg is not done.

Garrett
 
A

Asen Bozhilov

Garrett said:
Why `null` as first element?

Because an `Array' constructor is overloaded and I will have problems
if I use first parameter. For example if there is:

JSON.parse('10');

I will get an array with `length' property equal to 10. I use an
`Array' constructor instead array literal, because if I use array
literal I will have problems with the following code:

JSON.parse('][');

Calling expression use braces which are allowed in JSON string only in
strings. So here cannot be exploited my code, because if there are
braces `invalidTokens' will catch them.
 
G

Garrett Smith

Because an `Array' constructor is overloaded and I will have problems
if I use first parameter. For example if there is:

JSON.parse('10');

RIght - I didn't think about it when I posted.
I will get an array with `length' property equal to 10. I use an
`Array' constructor instead array literal, because if I use array
literal I will have problems with the following code:

JSON.parse('][');

Calling expression use braces which are allowed in JSON string only in
strings. So here cannot be exploited my code, because if there are
braces `invalidTokens' will catch them.
I'm going to take a look at that and write a test for both yours and
modified version of Doug's.


Garrett
 
G

Garrett Smith

Garrett said:
That would require more code to be downloaded and more processing to run
it. What about mobile devices?

You are confused. What good is shorter code that is not a solution?
var isValidJSON = /^[\],:{}\s]*$/.
test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
replace(
/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,
']').
replace(/(?:^|:|,)(?:\s*\[)+/g, ''))

Is this from json2.js? If yes, then it is not acceptable. To begin
with, it does not regard "\"" valid JSON even though it is.

The code is from json2.js:
http://www.json.org/json2.js

Then it must be either summarily dismissed, or updated at least as follows:

/"([^"\\]|\\.)*"|.../

because *that* is the proper way to match a double-quoted string with
optional escape sequences. Refined for JSON, it must be at least

/"([^"\\^\x00-\x1F]|\\["\\\/bfnrt]|\\u[0-9A-Fa-f]{4})*"|.../

There is a problem; TAB character code is 9 and all implementations
allow it.

I see you went from unicode escape sequences to hex escapes, but why
\x00 and not \x0? Or why not just use a decimal escape \0?

The character range could be written more compactly; instead of
[0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not considered too
unreadable. Though similar in appearance, it could not be `[o-f]`
because that would be result in a SyntaxError, thrown by CharacterRange,
step 6. If i > j then throw a SyntaxError exception.

I haven't tested it, yet. More on testing below...


// Untested.
var jsonStringExp =
/"\t|(?:[^"\\^\0-\x1F]|\\["\\\/bfnrt]|\\u[0-f]{4})*"/,

// DecimalIntegerLiteral ::
// 0 NonZeroDigit DecimalDigitsopt
// JSONNumber:
// -opt DecimalIntegerLiteral JSONFractionopt ExponentPartopt
jsonNumberExp = /-?(0|[1-9]+)(?:\.\d+)?(?:[eE][+\-]\d+)?/,
jsonPrimitiveExp = new RegExp(
jsonStringExp.source
+ "|" + jsonNumberExp.source
+ "|true|false|null", "g"
);

var passExp = /^[\],:{}\s]*$/;

function isInvalidJson(text) {
var filtered = text.replace(jsonPrimitiveExp, ']')
.replace(/(?:^|:|,)(?:\s*\[)+/g, '')
.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g);

return passExp.test(filtered);
}
That is gibberish. Either it is JSON, or it is ECMAScript.

Not gibberish.

ECMAScript defines JSON for ECMAScript. When I refer to JSON in
ECMAScript, I am referring to the JSON Grammar defined in ECMAScript 5.

JSON Grammar in ECMAScript 5 differs from the grammar specified in RFC
4627 in that it allows primitive values at top-level.
You are confused.

I don't think so.

What you've allowed in your RegExp doesn't match JSON Grammar defined in
ECMA 5.
"\"" is both an ES string literal and JSON for the string containing"

Yes.

"\\\"" is both an ES string literal and JSON for the string containing \"

Yes.

"\\"" is neither an ES string literal nor JSON.

Correct.


But that was not the purpose of the JSON string.

What do you mean by the purpose of the JSON String? The purpose is
inside the programmer.
Yes, but that is not how JSON is usually being put in. That is, the
escaping backslash is _not_ escaped then, and the characters that
quoteMarkInJSONString contains are

\"

Right; a JSONValue is usually going to be supplied as a value of an
identifier, e.g. xhr.responseText. A string value having the character
sequence - "\"" - is valid JSON.
and not

"

That would be invalid.
whereas only the latter was intended.


A JSON string *literal* may very well contain a literal backslash character,
and it may also contain a literal double quote. The expression fails to
recognize that.

No, a JSONString may not contain a literal backslash character.

| JSONString ::
| "JSONStringCharactersopt "
|
| JSONStringCharacters ::
| JSONStringCharacter JSONStringCharactersopt
|
| JSONStringCharacter ::
| SourceCharacter but not double-quote " or backslash \ or U+0000 thru
| U+001F \ JSONEscapeSequence

JSONStringCharacter may not contain backslash unless that is part of a
JSONEscapeSequence.

| JSONEscapeSequence ::
| JSONEscapeCharacter
| UnicodeEscapeSequence

[...]
You are very confused.

I don't think so. My understanding of the intent of json2.js comes from
code comments in it.

| // We split the second stage into 4 regexp operations in order to
| // work around crippling inefficiencies in IE's and Safari's regexp |
| // engines. First we replace the JSON backslash pairs with '@' (a
| // non-JSON character).

Replaces backslash chars with "@", which is excluded by /^[\],:{}\s]*$/
[...] The suggestion to use an object literal as the string to the
argument to JSON.parse is not any better than using "true".

But it is. It places further requirements on the capabilities of the
parser. An even better test would be a combination of all results of
all productions of the JSON grammar.

Cases that are known to be problematic can be filtered.

Your point being?

My point is that instead of trying every possible valid grammar check,
known bugs -- such as allowing 1. and +1 and 01, as seen in Spidermonkey
-- could be checked.

The purpose of this was to provide a viable fallback for JSON.parse().
Both your suggestion and the one in json2.js fail to do that.

No, I don't think it will be that difficult, but I want to get a test
suite for it first. Either with what I have now or with JsUnit.

The type of setup I am working on uses object literal notation for the
tests. I use Java style annotations, but in the test name.

For example:

APE.test.testSimple({
'test JSON.parse("1.") @throws SyntaxError' : function() {
JSON.parse("1.");
}
});

That test function would be expected to throw and in this case, the
thrown object would have a `name` property of exactly "SyntaxError".

Assertions use N-Unit style constraints. It's a bigger project than for
just this JSON test and the TestReporter is not done. IT renders the
tree as a UL but doesn't show any results (pass, fail, ignore, etc).

JsUnit would work fine for a JSON test and I may just use that instead.
After some sleep.

Garrett
 
T

Thomas 'PointedEars' Lahn

I take it that, since you have not answered this, you have come to see the
flaw in your logic here.
var isValidJSON = /^[\],:{}\s]*$/.
test(text.replace(/\\(?:["\\\/bfnrt]|u[0-9a-fA-F]{4})/g, '@').
replace(
/"[^"\\\n\r]*"|true|false|null|-?\d+(?:\.\d*)?(?:[eE][+\-]?\d+)?/g,
']').
replace(/(?:^|:|,)(?:\s*\[)+/g, ''))

Is this from json2.js? If yes, then it is not acceptable. To begin
with, it does not regard "\"" valid JSON even though it is.

The code is from json2.js:
http://www.json.org/json2.js

Then it must be either summarily dismissed, or updated at least as
follows:

/"([^"\\]|\\.)*"|.../

because *that* is the proper way to match a double-quoted string with
optional escape sequences. Refined for JSON, it must be at least

/"([^"\\^\x00-\x1F]|\\["\\\/bfnrt]|\\u[0-9A-Fa-f]{4})*"|.../

There is a problem; TAB character code is 9 and all implementations
allow it.

Then all implementations are wrong, or, put more politely, they implement
only something similar to JSON. The specification at json.org clearly says
that no control character is allowed in there. Control characters in the
Basic Latin Unicode block range from U+0000 to U+001F inclusive. If
anything, the character class is not exclusive enough, since there are
control characters beyond that block, from U+007F to U+009F inclusive (which
is easily fixed, though).
I see you went from unicode escape sequences to hex escapes, but why
\x00 and not \x0?

Because \x0 would be a syntax error, see ES3/5 7.8.4.
Or why not just use a decimal escape \0?

That is a possibility I was not aware of [15.10.2.11], indeed, but then I
had two different kinds of character escape sequences in one character
class, with one not being recognizable as easily. No advantage, only
disadvantages there; so no, thanks.
The character range could be written more compactly;

Yes, one could write /[\dA-Fa-f]/. Or even /…[\dA-F]…/i if there would be
no other letters in the remaining expression.
instead of [0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not
considered too unreadable.

Neither expression is equivalent to begin with.
Not gibberish.

ECMAScript defines JSON for ECMAScript. When I refer to JSON in
ECMAScript, I am referring to the JSON Grammar defined in ECMAScript 5.

JSON Grammar in ECMAScript 5 differs from the grammar specified in RFC
4627 in that it allows primitive values at top-level.

Fair enough.
I don't think so.

What you've allowed in your RegExp doesn't match JSON Grammar defined in
ECMA 5.

Yes, it does.
What do you mean by the purpose of the JSON String? The purpose is
inside the programmer.

That programmer was me. I did not want to have a string containing `\"',
I wanted a string containing `"'. Whatever you tried to prove, it did not
relate to what I wanted to have, and so not what I wanted to have matched,
too, by the regular expression.
Right; a JSONValue is usually going to be supplied as a value of an
identifier, e.g. xhr.responseText. A string value having the character
sequence - "\"" - is valid JSON.


That would be invalid.

Most certainly not, since "\"" is valid. I am talking about the literal
value here, after the expansion of escape sequences.
No, a JSONString may not contain a literal backslash character.

Yes, it may, as the "string" may contain escape sequences. So it is *wrong*
to disallow the literal backslash (`\\' in a RegExp literal) from the
content of the string, as that valid JSON escape sequence could then not be
matched, and the JSON text would be considered invalid when it is not.
| JSONString ::
| "JSONStringCharactersopt "
|
| JSONStringCharacters ::
| JSONStringCharacter JSONStringCharactersopt
|
| JSONStringCharacter ::
| SourceCharacter but not double-quote " or backslash \ or U+0000 thru
| U+001F \ JSONEscapeSequence

JSONStringCharacter may not contain backslash unless that is part of a
JSONEscapeSequence.

I rest my case.


PointedEars
 
G

Garrett Smith

I take it that, since you have not answered this, you have come to see the
flaw in your logic here.

If "not valid json" can be determined without writing a parser, and if
it that is more efficient, then the overhead of using a parser is
avoided. That's good.

In order to prove that, I'll need some tests.

[...]
Then all implementations are wrong, or, put more politely, they implement
only something similar to JSON. The specification at json.org clearly says
that no control character is allowed in there. Control characters in the
Basic Latin Unicode block range from U+0000 to U+001F inclusive. If
anything, the character class is not exclusive enough, since there are
control characters beyond that block, from U+007F to U+009F inclusive (which
is easily fixed, though).

Sorry, I should have thought more clearly before saying "all
implementations."

BESEN does not extend allowance for TAB in JSONString.

JSON.parse(' "\t" ');

BESEN IDE: SyntaxError

Other major implementations are extend grammar in JSONString to allow
literal TAB. The are wrong, as you claim, but not for the reason you've
supplied. That is, they are not wrong because the don't match what is
stated on json.org. The reason they are wrong is that they do not follow
the JSON Grammar, as defined in ECMAScript.

No other control characters, and you mention U+007F, are explicitly
excluded for the production for JSONString in ECMAScript.

A fallback implementation must not be more strict than the
specification. Filtering out other control characters (007F, etc) would
be a violation of the ECMAScript specification and would not match what
implementations do.

JSON.parse(' "\u007f" ');

Parses a string containing the delete character.

JSON.parse(' "\u007f" ') === "\u007f";

true.

In BESEN, Opera, Firefox 3.6.3, and probably others. It matches the spec.

It would make sense to allow the fallback to allow \t, as all
implementations are allowing that today.
Because \x0 would be a syntax error, see ES3/5 7.8.4.

"7.8.4 String Literals".
That is the wrong section. That section is describes character escape
sequences in Strings; look down to the section on Regular Expressions.

"7.8.5 Regular Expression Literals"
The specification there states:

| HexEscapeSequence ::
| x HexDigit HexDigit

So you still manage to be correct, even though you've cited the wrong
section.
Or why not just use a decimal escape \0?

That is a possibility I was not aware of [15.10.2.11], indeed, but then I
had two different kinds of character escape sequences in one character
class, with one not being recognizable as easily. No advantage, only
disadvantages there; so no, thanks.

What are the disadvantages? I see one: "not recognizable easily". Is
that so? I could see how it could be mistaken for an octal escape
sequence, but octal 0 and decimal 0 are the same 0.

What are the other disadvantages?
The character range could be written more compactly;

Yes, one could write /[\dA-Fa-f]/. Or even /…[\dA-F]…/i if there would be
no other letters in the remaining expression.
instead of [0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not
considered too unreadable.

Neither expression is equivalent to begin with.

Right. [0-f] is not because it includes a bunch of other characters and
[0-9A-f] is not because it includes "G-Z".

I'm concerned with using /[\dA-F]/i

So the following should be good:
/[\dA-Fa-f]/
[\dA-F]/i
/[0-9A-Fa-f]

[...]
That programmer was me. I did not want to have a string containing `\"',
I wanted a string containing `"'. Whatever you tried to prove, it did not
relate to what I wanted to have, and so not what I wanted to have matched,
too, by the regular expression.

I think we're both know what a JSONString can and cannot contain
JSONString can't contain the unescaped character ". This is shown in the
ES5 spec under grammar for JSONStringCharacter and JSONEscapeSequence,
quoted in my earlier message states.
Most certainly not, since "\"" is valid. I am talking about the literal
value here, after the expansion of escape sequences.

What?! I see we're back to this. Again, an unescaped " is invalid in a
JSONString.

JSON.parse(' "\"" ');

Must result in a SyntaxError.

[...]

Correction: "must not".
I rest my case.

OK, so we agree on that. However JSONStringCharacter also must not
contain a double quote mark unless that is part of a JSONEscapeSequence.

Garrett
 
T

Thomas 'PointedEars' Lahn

Garrett said:
If "not valid json" can be determined without writing a parser, and if
it that is more efficient, then the overhead of using a parser is
avoided. That's good.

In order to prove that, I'll need some tests.

You need a dosage of common sense instead. Or a course in theoretical
computer science.
It would make sense to allow the fallback to allow \t, as all
implementations are allowing that today.

No, it would make sense to implement what was specified, and work around the
borken implementations that way until they no longer needed that workaround,
and report the bug so that it is fixed.
"7.8.4 String Literals".
That is the wrong section. That section is describes character escape
sequences in Strings; look down to the section on Regular Expressions.

Look what the section on Regular Expressions refers to.
"7.8.5 Regular Expression Literals"
The specification there states:

| HexEscapeSequence ::
| x HexDigit HexDigit

And now go read in which terms that production is defined.
So you still manage to be correct, even though you've cited the wrong
section.

No, I did not. I thought I make it easier for you by referring you to the
referred definition, but I had not considered your reading problem. Sorry.
Or why not just use a decimal escape \0?

That is a possibility I was not aware of [15.10.2.11], indeed, but then I
had two different kinds of character escape sequences in one character
class, with one not being recognizable as easily. No advantage, only
disadvantages there; so no, thanks.

What are the disadvantages? I see one: "not recognizable easily".

I rest my case. You really can't read.
Is that so?

Yes, it is.
I could see how it could be mistaken for an octal escape
sequence, but octal 0 and decimal 0 are the same 0.
Rubbish.

What are the other disadvantages?

Learn to read.
The character range could be written more compactly;

Yes, one could write /[\dA-Fa-f]/. Or even /…[\dA-F]…/i if there would
be no other letters in the remaining expression.
instead of [0-9A-Fa-f], [0-9A-f], or even just [0-f], if that is not
considered too unreadable.

Neither expression is equivalent to begin with.

Right. [0-f] is not because it includes a bunch of other characters and
[0-9A-f] is not because it includes "G-Z".

You don't say!
I'm concerned with using /[\dA-F]/i

I am not, under the stated conditions. But these conditions may not apply
here.
So the following should be good:
/[\dA-Fa-f]/
[\dA-F]/i
/[0-9A-Fa-f]

You appear to have no idea what you are talking about.
I think we're both know what a JSONString can and cannot contain

No, "we're" don't.
JSONString can't contain the unescaped character ".

And nobody said it could.
JSON.parse(' "\"" ');

Must result in a SyntaxError.

Of course, but that is well beside the point.

"\""

e.g. in an HTTP response message body is with absolute certainty JSON text,
and so must pass validation. You are clinging to ECMAScript string literals
instead, where further escaping of JSON strings is necessary.
Correction: "must not".


OK, so we agree on that.

Quite the contrary. Instead of proving me wrong, you have thus confirmed my
argument that /"[^\\"...]*"/ is insufficient to match a JSON string.


PointedEars
 
G

Garrett Smith

You need a dosage of common sense instead. Or a course in theoretical
computer science.


No, it would make sense to implement what was specified, and work around the
borken implementations that way until they no longer needed that workaround,
and report the bug so that it is fixed.

Hold your breath, Thomas, It'll surely be fixed soon.
Quite the contrary. Instead of proving me wrong, you have thus confirmed my
argument that /"[^\\"...]*"/ is insufficient to match a JSON string.

We seem to have different goals.

My goal is to develop a solution that can be used to evaluate a JSON
response. I'm going to continue with that goal.

Garrett
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,078
Messages
2,570,570
Members
47,204
Latest member
MalorieSte

Latest Threads

Top