Ian Woods said:
This is true for every PRNG since they're deterministic. The question is
how hard is it given any subsequence of output given the algorithm - you
only need to determine the seed at any single point in the sequence to know
the whole sequence. Unless you're doing some behind-the-scenes magic with
the seed, I don't see how throwing away 'n' numbers helps you. Could you
explain further?
Certainly. For one, the seed you supply RC4 is not simply the initial
state. Instead, there is a procedure (in Ben's code, the logic resides in
the function prng_seed) to convert that initial seed into an initial state.
This procedure is known to produce biases in the initial state, and these
biases produce biases in the initial output -- not very strong ones,
granted, but strong enough to be cryptographically interesting. When you
run the procedure to generate outputs (prng_get_octet in Ben's code), this
tends to smear out the biases, and thus the artifacts are limited to the
first couple initial outputs.
In addition, things get more interesting if you allow someone to view the
initial output of multiple related seeds. For example, if you give an
attacker the first octet output from the following seeds:
00 00 00 XXXXXXXX
00 00 01 XXXXXXXX
00 00 02 XXXXXXXX
etc...
then with enough of these outputs, the attacker can deduce values of the
common XXXXXXXX portion. Real protocols have been attacked because of this
observation, and this has made the cryptographical community quite wary of
the initial output of RC4.
Now, if you are just looking at creating a statistically good looking random
number generator, the biases from the first point are probably too weak to
really worry about, and the second point about related seeds is irrelevent.
However, cryptographically speaking, they are valid concerns.
(If you need references to any of the above, just ask)