Johannes Baagoe :
Ry Nohryb :
How would you write that test ?
In the specific case involved, `decrypt` returns a String, whose
charCodeAt method returns Numbers in [0, 65535] for any i between
0 and the length of the String minus 1. (In most other languages,
one would rather count bytes than unsigned 16-bit char codes.)
So we initialise an Array (let us call it `counts`) of 65536 zeros,
and loop over the decrypted String counting character codes :
for (var i = plain.length - 1; i >= 0; i--) {
counts[plain.charCodeAt(i)]++;
}
(In real applications, this would be done incrementally, as the
decryption progresses. One may also stop before it is ended, once
a sufficiently large substring for the test has been decrypted.)
If the count distribution is close to uniform, each count will be
approximately equal to the length divided by 65536. Let us call that
value `expected`, and calculate Pearson's chi-square statistic :
var expected = plain.length / 65536;
var chi2 = 0;
for (var i = 0; i < 65536; i++) {
var diff = counts
- expected;
chi2 += diff * diff / expected;
}
In the case of an unsuccessful decryption, the result will be
reasonably close to 65536. On successful decryption, it will be *much*
higher, say, above 70000 - the precise threshold that should ring
a bell depends on how much time you are prepared to spend in human
inspection of false positives vs. how much you want to be sure not to
miss the real solution. It also depends on the encoding, for ASCII,
you may be quite sure that all the counts above 127 will be 0, and
choose a corresponding very high test value - or a simpler test.
(A chi2 much *lower* that 65536, say, less than 60000, is either
a bug or a miracle.)
Scrupulous minds will note that Pearson's test may not be appropriate,
if for not nothing else, then because `expected` will be less than
5 for short plaintexts. In actual practice, it doesn't matter -
the test discriminates quite nicely, and it is faster than more
theoretically reputable ones.