Alex said:
fumanchu said:
I'm surprised noone has pursued a course of subtraction rather than
division. Say you want 10 numbers:
s = 1.0
n = []
for x in xrange(9):
... value = random.random() * s
... n.append(value)
... s -= value
...
[0.7279111122901516, 0.082128708606867745, 0.0080516733577621798,
0.12122060245902817, 0.0034460458833209676, 0.0021046234724371184,
0.054109424914363845, 0.00035750970249204185, 0.00051175075536832372,
0.00015854855820800087]
1.0
Either:
1) Just because they're *ordered* doesn't mean they're not *random*,
or
2) You all now know why I'm not a mathematician.
It seems to me that the only constraint on the randomness of my results
is the OP's constraint: that they sum to 1. I'd be fascinated to learn
if and why that wouldn't work.
n[0] is uniformly distributed between 0 and 1; n[1] is not -- not sure
how to characterize its distribution, but it's vastly skewed to favor
smaller values -- and further n[x] values for x>1 are progressively more
and more skewed similarly.
Such total disuniformity, where the very distribution of each value is
skewed by the preceding one, may still be "random" for some sufficiently
vague meaning of "random", but my intuition suggests it's unlikely to
prove satisfactory for the OP's purposes.
[digression]
All of this discussion about whether the distribution of values is uniform or
not doesn't mean much until one has defined "uniformity," or equivalently,
"distance" in the space we're talking about. In this case, we're talking about
the unit n-simplex space (SS^n) which has elements S=(s_1, s_2, ... s_n) where
sum(S) = 1 and s_i >= 0. I favor the Aitchison distance:
import numpy as np
def aitchison_distance(x, y):
""" Compute the Aitchison distance between two vectors in simplex space.
"""
lx = np.log(x)
ly = np.log(y)
lgx = np.mean(lx, axis=-1)
lgy = np.mean(ly, axis=-1)
diff = (lx - lgx) - (ly - lgy)
return np.sqrt(np.sum(diff*diff))
Note that zeros yield inifinities, so the borders of the unit simplex are
infinitely farther away from other points in the interior. Consequently,
generating "uniform" random samples from this space is as impractical as it is
to draw "uniform" random samples from the entire infinite real number line. It's
also not very interesting. However, one can transform SS^n into RR^(n-1) and
back again, so drawing numbers from a multivariate normal of whatever mean and
covariance you like will give you "nice" simplicial data and quite possibly even
realistic data, too, depending on your model. I like using the isometric
log-ratio transform ("ilr" transform) for this.
Good Google search terms: "compositional data", Aitchison
But of course, for the OP's purpose of creating synthetic Markov chain
transition tables, generating some random vectors uniformly on [0, 1)^n and
normalizing them to sum to 1 works a treat. Don't bother with anything else.
--
Robert Kern
(e-mail address removed)
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco