Is this secure?

M

Michael Rudolf

Am 24.02.2010 18:23, schrieb mk:
Even then I'm not getting completely uniform distribution for some reason:
d 39411
l 39376
f 39288
a 39275
s 39225
r 39172
p 39159
t 39073
k 39071
u 39064
e 39005
o 39005
n 38995
j 38993
h 38975
q 38958
c 38938
b 38906
g 38894
i 38847
m 38819
v 38712
z 35321
y 35228
w 35189
x 35075

Code:

import operator

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])

The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z. You might want to skip the values 0-22
to achieve a truly uniform distribution.

FYI: Electronic Cash PINs in europe (dont know about the rest of the
world) were computed the same way (random hexdigit and just mod it when
it's too large) leading to a high probability that your first digit was
a 1 :)

Regards,
Michael
 
S

Steve Holden

mk said:
The stuff about converting 4 random bytes to a decimal string and then
peeling off 2 digits at a time is pretty awful, and notice that since
2**32 is 4294967296, in the cases where you get 10 digits, the first
2-digit pair is never higher than 42.

Yikes! I didn't think about that. This is probably where (some part of)
probability skewing comes from.

Anyway, the passwords for authorized users will be copied and pasted
from email into in the application GUI which will remember it for them,
so they will not have to remember and type them in. So I have little in
the way of limitations of password length - even though in *some* cases
somebody might have to (or be ignorant enough) to retype the password
instead of pasting it in.

In that case the "diceware" approach is not necessary, even though I
will certainly remember this approach for a case when users will have to
remember & type the passwords in.

The main application will access the data using HTTP (probably), so the
main point is that an attacker is not able to guess passwords using
brute force.

Using A-z with 10-char password seems to provide 3 orders of magnitude
more combinations than a-z:
95367431640625L

Even though I'm not sure it is worth it, assuming 1000 brute-force
guesses per second (which over the web would amount pretty much to DOS),
this would take # days:
1103789L

Even then I'm not getting completely uniform distribution for some reason:

d 39411
l 39376
f 39288
a 39275
s 39225
r 39172
p 39159
t 39073
k 39071
u 39064
e 39005
o 39005
n 38995
j 38993
h 38975
q 38958
c 38938
b 38906
g 38894
i 38847
m 38819
v 38712
z 35321
y 35228
w 35189
x 35075

Code:

import operator

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

if __name__ == "__main__":
chardict = {}
for i in range(100000):
w = gen_rand_word(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count
I'd write your code something like this:

nletters = 5

def randomword(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a')+ord(c)%26) for c in f.read(n)])

print randomword(nletters)

Aw shucks when will I learn to do the stuff in 3 lines well instead of
20, poorly. :-/
When you've got as much experience as Paul?

regards
Steve
 
M

mk

When you've got as much experience as Paul?

And how much experience does Paul have? (this is mostly not a facile
question)

For my part, my more serious effort (on and off) with programming in
Python is under a year.

Regards,
mk
 
M

mk

The reason is 256 % 26 != 0
256 mod 26 equals 22, thus your code is hitting a-v about 10% (256/26 is
approx. 10) more often than w-z.

<Barbie voice>writing secure code is hard...

I'm going to switch to PHP: Python world wouldn't lose much, but PHP
would gain a lot.
You might want to skip the values 0-22
to achieve a truly uniform distribution.

Hmm perhaps you meant to skip values over 256 - 22 ? Bc I'm getting this
(reduced the run to 1000 generated strings):

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)
if ord(x) > 22])


z 3609
b 3608
s 3567
e 3559
j 3556
r 3555
g 3548
p 3540
m 3538
q 3532
h 3528
y 3526
v 3524
i 3500
x 3496
c 3488
k 3488
l 3487
u 3487
a 3469
o 3465
d 3455
t 3439
f 3437
n 3417
w 3175

While with this:

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n)
if ord(x) < 235])

a 3852
w 3630
s 3623
v 3582
y 3569
p 3568
c 3558
k 3558
b 3556
r 3553
x 3546
m 3534
n 3522
o 3515
h 3510
d 3505
u 3487
t 3486
i 3482
f 3477
e 3474
g 3460
q 3453
l 3437
z 3386
j 3382

1. I'm systematically getting 'a' outlier: have no idea why for now.

2. This is somewhat better (except 'a') but still not uniform.

FYI: Electronic Cash PINs in europe (dont know about the rest of the
world) were computed the same way (random hexdigit and just mod it when
it's too large) leading to a high probability that your first digit was
a 1 :)

Schadenfreude is deriving joy from others' misfortunes; what is the
German word, if any, for deriving solace from others' misfortunes? ;-)

Regards,
mk
 
R

Robert Kern

While with this:

def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
< 235])

a 3852 ....

1. I'm systematically getting 'a' outlier: have no idea why for now.

2. This is somewhat better (except 'a') but still not uniform.

I will repeat my advice to just use random.SystemRandom.choice() instead of
trying to interpret the bytes from /dev/urandom directly.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
P

Paul Rubin

Robert Kern said:
I will repeat my advice to just use random.SystemRandom.choice()
instead of trying to interpret the bytes from /dev/urandom directly.

SystemRandom is something pretty new so I wasn't aware of it. But
yeah, if I were thinking more clearly I would have suggested os.urandom
instead of opening /dev/urandom.
 
M

mk

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.

Oh I hear you -- for production use I would (will) certainly consider
this. However, now I'm interested in the problem itself: why is the damn
distribution not uniform?

Regards,
mk
 
M

mk

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.

Out of curiosity:

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

if __name__ == "__main__":
chardict = {}
for i in range(10000):
## w = gen_rand_word(10)
w = gen_rand_string(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count


s 3966
d 3912
g 3909
h 3905
a 3901
u 3900
q 3891
m 3888
k 3884
b 3878
x 3875
v 3867
w 3864
y 3851
l 3825
z 3821
c 3819
e 3819
r 3816
n 3808
o 3797
f 3795
t 3784
p 3765
j 3730
i 3704

Better, although still not perfect.

Regards,
mk
 
R

Robert Kern

Oh I hear you -- for production use I would (will) certainly consider
this. However, now I'm interested in the problem itself: why is the damn
distribution not uniform?

You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
M

Michael Rudolf

Am 24.02.2010 19:35, schrieb mk:
<Barbie voice>writing secure code is hard...

So true. That's why one should stick to standard libs when it comes to
crypto or security in general. It's just to easy to mess it up. Just ask
Debian about whether touching OpenSSL was a good idea ;)

Hmm perhaps you meant to skip values over 256 - 22 ?

That's the same thing as x mod y equals x+N*y mod y for every natural N.
def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)

Off-by-one-error: you're skipping len(range(22))==23 hits.
OK, I just see that I wrote misleading 0-22 while I meant range(22).

While with this:
def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)
< 235])

Same off-by-one.
Schadenfreude is deriving joy from others' misfortunes; what is the
German word, if any, for deriving solace from others' misfortunes? ;-)

Well - "Schadenfreude" *is* in fact a german word :)
"Schaden" is the event or result of misfortune, "Freude" is joy.


Well, I really think that you should use repeated Random.choice on an
alphabet.
Or Random.Systemrandom.choice if you don't trust the PRNG.

Regards,
Michael
 
P

Paul Rubin

mk said:
So I have little in the way of limitations of password length ...>
The main application will access the data using HTTP (probably), so
the main point is that an attacker is not able to guess passwords
using brute force.

If it's HTTP instead of HTTPS and you're sending the password in the
clear, then a serious attacker can simply eavesdrop the connection and
pick up the password. Again, if the application is a web forum or
something like that, the security requirements probably aren't terribly
high. If it's (say) a financial application with potentially motivated
attackers, you've got to be a lot more careful than I think you're being
right now, and you should really get security specialists involved.
Using A-z with 10-char password seems to provide 3 orders of magnitude
more combinations than a-z:

Yes, 2**10 = 1024 so (57/25)**10 is a little more than that.
Even then I'm not getting completely uniform distribution for some reason:

Exact equality of the counts would be surprising and a sign that
something was wrong with the generation process. It would be like
flipping a coin 10000 times and getting exactly 5000 heads. The
binomial distribution tells you that the number should be close to 5000,
but that it's unlikely to be -exactly- 5000.

Also, as Michael Rudolf mentioned, getting a letter by taking n%26 where
n is drawn uniformly from [0..255] doesn't give a uniform distribution
because 256 is not a multiple of 26. I had thought about making an
adjustment for that when I posted, but it didn't seem worth cluttering
up the code. Uniformity for its own sake doesn't gain you anything;
what matters is entropy. If you compute the entropy difference between
the slightly nonuniform distribution and a uniform one, it's very small.

To get a more uniform distribution I usually just take a larger n,
rather than conditionalizing the draws. For example, in the
diceware-like code I posted, I read 10 random bytes (giving a uniform
random number on [0..2**80]) from urandom for each word. That is still
not perfectly uniform, but it's closer to the point where the difference
would be very hard to detect.
Aw shucks when will I learn to do the stuff in 3 lines well instead of
20, poorly. :-/

Well, that's partly a matter of practice, but I'll mention one way I
simplified the code, which was by reading more bytes from /dev/urandom
than was really necessary. I read one byte for each random letter
(i.e. throwing away about 3 random bits for each letter) while you tried
to encode the urandom data cleverly and map 4 random bytes to 5
alphabetic letters. /dev/urandom uses a cryptographic PRNG and it's
pretty fast, so reading a few extra bytes from it to simplify your code
doesn't really cost you anything.
 
R

Robert Kern

I will repeat my advice to just use random.SystemRandom.choice() instead
of trying to interpret the bytes from /dev/urandom directly.

Out of curiosity:

def gen_rand_string(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

if __name__ == "__main__":
chardict = {}
for i in range(10000):
## w = gen_rand_word(10)
w = gen_rand_string(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
for char, count in counts:
print char, count


s 3966
d 3912
g 3909
h 3905
a 3901
u 3900
q 3891
m 3888
k 3884
b 3878
x 3875
v 3867
w 3864
y 3851
l 3825
z 3821
c 3819
e 3819
r 3816
n 3808
o 3797
f 3795
t 3784
p 3765
j 3730
i 3704

Better, although still not perfect.

This distribution is well within expectations.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
M

mk

You want "< 234", not "< 235". (234 % 26 == 0), so you get some extra 'a's.

Right, this explains the 'a' outlier. Fixed. But still:

import operator
import os
import random
import math

def rand_str_custom(n):
s = os.urandom(n)
return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) <
234])

def count_chars(chardict, word):
for c in word:
try:
chardict[c] += 1
except KeyError:
chardict[c] = 0

def rand_str_SystemRandom_seeding(length):
seed = os.urandom(32)
random.seed(seed)
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def rand_str_SystemRandom_noseeding(length):
prng = random.SystemRandom()
chars = []
for i in range(length):
chars.append(prng.choice('abcdefghijklmnopqrstuvwxyz'))
return ''.join(chars)

def sd(x):
sd.sum += x
sd.sum2 += x*x
sd.n += 1.0
sum, sum2, n = sd.sum, sd.sum2, sd.n
return math.sqrt(sum2/n - sum*sum/n/n)

def gen_rand_with_fun(fun):
print fun.__name__
chardict = {}
for i in range(10000):
w = fun(10)
count_chars(chardict, w)
counts = list(chardict.items())
counts.sort(key = operator.itemgetter(1), reverse = True)
nums = [c[1] for c in counts]
sd.sum = sd.sum2 = sd.n = 0
mean = (1.0*sum(nums))/len(nums)
stddev = map(sd, nums)[-1]
print 'mean', mean, 'std dev', stddev
for char, count in counts:
print char, count, '%.2f' % ((count - mean)/stddev), 'std devs
away from mean'

if __name__ == "__main__":
gen_rand_with_fun(rand_str_SystemRandom_seeding)
print
gen_rand_with_fun(rand_str_SystemRandom_noseeding)
print
gen_rand_with_fun(rand_str_custom)




rand_str_SystemRandom_seeding
mean 3845.15384615 std dev 46.2016419186
l 3926 1.75 std devs away from mean
y 3916 1.53 std devs away from mean
d 3909 1.38 std devs away from mean
a 3898 1.14 std devs away from mean
p 3898 1.14 std devs away from mean
c 3889 0.95 std devs away from mean
u 3884 0.84 std devs away from mean
j 3873 0.60 std devs away from mean
n 3873 0.60 std devs away from mean
w 3866 0.45 std devs away from mean
x 3863 0.39 std devs away from mean
r 3855 0.21 std devs away from mean
m 3852 0.15 std devs away from mean
b 3841 -0.09 std devs away from mean
t 3835 -0.22 std devs away from mean
o 3829 -0.35 std devs away from mean
k 3827 -0.39 std devs away from mean
i 3821 -0.52 std devs away from mean
s 3812 -0.72 std devs away from mean
q 3806 -0.85 std devs away from mean
v 3803 -0.91 std devs away from mean
g 3799 -1.00 std devs away from mean
h 3793 -1.13 std devs away from mean
e 3782 -1.37 std devs away from mean
f 3766 -1.71 std devs away from mean
z 3758 -1.89 std devs away from mean

rand_str_SystemRandom_noseeding
mean 3845.15384615 std dev 55.670522726
i 3961 2.08 std devs away from mean
r 3911 1.18 std devs away from mean
e 3910 1.16 std devs away from mean
m 3905 1.08 std devs away from mean
a 3901 1.00 std devs away from mean
u 3893 0.86 std devs away from mean
t 3882 0.66 std devs away from mean
w 3872 0.48 std devs away from mean
s 3870 0.45 std devs away from mean
c 3868 0.41 std devs away from mean
n 3866 0.37 std devs away from mean
q 3865 0.36 std devs away from mean
k 3863 0.32 std devs away from mean
y 3848 0.05 std devs away from mean
j 3836 -0.16 std devs away from mean
v 3830 -0.27 std devs away from mean
f 3829 -0.29 std devs away from mean
z 3829 -0.29 std devs away from mean
g 3827 -0.33 std devs away from mean
l 3818 -0.49 std devs away from mean
b 3803 -0.76 std devs away from mean
d 3803 -0.76 std devs away from mean
p 3756 -1.60 std devs away from mean
x 3755 -1.62 std devs away from mean
h 3744 -1.82 std devs away from mean
o 3729 -2.09 std devs away from mean

rand_str_custom
mean 3517.15384615 std dev 40.7541336343
i 3586 1.69 std devs away from mean
a 3578 1.49 std devs away from mean
e 3575 1.42 std devs away from mean
m 3570 1.30 std devs away from mean
q 3562 1.10 std devs away from mean
c 3555 0.93 std devs away from mean
g 3552 0.86 std devs away from mean
w 3542 0.61 std devs away from mean
p 3536 0.46 std devs away from mean
x 3533 0.39 std devs away from mean
s 3528 0.27 std devs away from mean
o 3524 0.17 std devs away from mean
d 3516 -0.03 std devs away from mean
t 3515 -0.05 std devs away from mean
h 3511 -0.15 std devs away from mean
v 3502 -0.37 std devs away from mean
z 3502 -0.37 std devs away from mean
b 3500 -0.42 std devs away from mean
f 3496 -0.52 std devs away from mean
u 3492 -0.62 std devs away from mean
l 3486 -0.76 std devs away from mean
r 3478 -0.96 std devs away from mean
n 3476 -1.01 std devs away from mean
j 3451 -1.62 std devs away from mean
k 3450 -1.65 std devs away from mean
y 3430 -2.14 std devs away from mean


It would appear that SystemRandom().choice is indeed best (in terms of
how much the counts stray from mean in std devs), but only after seeding
it with os.urandom.


Regards,
mk
 
M

mk

So true. That's why one should stick to standard libs when it comes to
crypto or security in general. It's just to easy to mess it up. Just ask
Debian about whether touching OpenSSL was a good idea ;)

That was brain-dead hiccup, for crying out loud how could they do smth
so stupid.
def gen_rand_word(n):
with open('/dev/urandom') as f:
return ''.join([chr(ord('a') + ord(x) % 26) for x in f.read(n) if ord(x)

Off-by-one-error: you're skipping len(range(22))==23 hits.

Argh, it's late here.
Well, I really think that you should use repeated Random.choice on an
alphabet.
Or Random.Systemrandom.choice if you don't trust the PRNG.

I just posted a comparison with calculating std deviations for various
methods - using os.urandom, SystemRandom.choice with seeding and without
seeding.

They all seem to have slightly different distributions.

Regards,
mk
 
P

Paul Rubin

mk said:
def rand_str_custom(n):
s = os.urandom(n)
return ''.join([chr(ord('a') + ord(x) % 26) for x in s if ord(x) < 234])

Note that simply throws away some of the chars. You have to replace
them, not throw them away.
rand_str_SystemRandom_seeding
mean 3845.15384615 std dev 46.2016419186
l 3926 1.75 std devs away from mean
y 3916 1.53 std devs away from mean
....

What do you think you're measuring here? Yes, if you're doing 1000's of
draws from a distribution, you'd expect a few of them to be 1.75 sigma
from the mean. Since there are 26 letters, you'd expect a multinomial
distribution which you can test for with the multinomial test or some
approximation from the article:

http://en.wikipedia.org/wiki/Multinomial_test

I wish I knew more statistics than I do, since there is probably some
more familiar statistical test (e.g. the T-test) that you can use as the
number of trials gets large, since each bin of the multinomial
distribution should eventually start to look like a normal distribution
due to the central limit theorem. Others here know a lot more about
this stuff than I do, and can probably give better advice.

Anyway though, the output of os.urandom should be extremely hard to
distinguish from real randomness (that's the whole point of a
cryptographic PRNG).
 
R

Robert Kern

It would appear that SystemRandom().choice is indeed best (in terms of
how much the counts stray from mean in std devs), but only after seeding
it with os.urandom.

Calling random.seed() does not affect SystemRandom() whatsoever. You are getting
perfectly acceptable distributions for all three variants.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
M

Michael Rudolf

Am 24.02.2010 21:06, schrieb mk:
I just posted a comparison with calculating std deviations for various
methods - using os.urandom, SystemRandom.choice with seeding and without
seeding.

I saw them
They all seem to have slightly different distributions.

No they don't. Just run those tests again and you will see that you
cannot put them in any order or behaviour. They are all correct now,
except that you cannot seed SystemRandom, as it is *not* a PRNG (at
least here, it is a wrapper for /dev/random)

Regards,
Michael
 
S

Steven D'Aprano

Anyway, the passwords for authorized users will be copied and pasted
from email into in the application GUI which will remember it for them,
so they will not have to remember and type them in.

So to break your application's security model, all somebody has to do is
use their PC and they have full access to their account?

Or get hold of the copy and paste buffer?

Or the application's config files?


So I have little in
the way of limitations of password length - even though in *some* cases
somebody might have to (or be ignorant enough) to retype the password
instead of pasting it in.

Or your users might be sensible enough to not trust a role-your-own
security model, and prefer to memorize the password than to trust that
nobody will get access to their PC.


The main application will access the data using HTTP (probably), so the
main point is that an attacker is not able to guess passwords using
brute force.

And why would they bother doing that when they can sniff the wire and get
the passwords in plain text? You should assume your attackers are
*smarter* than you, not trust them to be foolish.
 
P

Paul Rubin

mk said:
Anyway, the passwords for authorized users will be copied and pasted
from email into in the application GUI which will remember it for
them, so they will not have to remember and type them in.

It occurs to me that you don't even need to mess with letters in that
case:

password = os.urandom(5).encode('hex')

will generate a string of 10 hex digits that you can give to the user.
(That is for Python 2.x but I think it might be broken in Python 3).

It might be helpful if you could say what your application does, or
anyway give an idea of what its actual security requirements are.
Generating and emailing someone a random password is a fairly standard
method for (e.g.) web forums to verify that the person has supplied a
working email address, basically as a first level spam filter. Your
scheme is probably ok for that. If you're doing something with more
demanding security requirements, then as mentioned before, there is a
whole lot of stuff you have to pay attention to, and focusing narrowly
on password generation isn't that useful.
 
M

mk

So to break your application's security model, all somebody has to do is
use their PC and they have full access to their account?

Or get hold of the copy and paste buffer?

Or the application's config files?

Yes. There's no way around this, short of forcing them to use hardware
key, which is an overkill for this application.
Or your users might be sensible enough to not trust a role-your-own
security model, and prefer to memorize the password than to trust that
nobody will get access to their PC.

The app is not that critical, it's about quarterly subscription to the
service, and the users will be able to reset the password anyway. If it
were that critical, I'd use the hardware keys; if hardware keys are not
used, once somebody gains an (unconstrained) access to the user's PC,
there's not much that app developer can do. I've read somewhere a
warning from PuTTY developer that even though the key is (normally)
protected by the passphrase, losing even an encrypted key is quite
likely to lead to its compromise. There's even some software for that on
the net:

http://www.neophob.com/serendipity/index.php?/archives/127-PuTTY-Private-Key-cracker.html

And why would they bother doing that when they can sniff the wire and get
the passwords in plain text? You should assume your attackers are
*smarter* than you, not trust them to be foolish.

I should have written HTTPS.

Regards,
mk
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,176
Messages
2,570,950
Members
47,500
Latest member
ArianneJsb

Latest Threads

Top