encryption with python

J

jlocc

Hi!

I was wondering if someone can recommend a good encryption algorithm
written in python. My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??

Thanks in advance,
J
 
M

Michael J. Fromberger

Hi!

I was wondering if someone can recommend a good encryption algorithm
written in python. My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??

I recommend you investigate PyCrypto:
http://www.amk.ca/python/code/crypto
http://sourceforge.net/projects/pycrypto

Cheers,
-M
 
S

Steve M

My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

Here's one:
def encrypt(x, y):
"""Return a number that combines x and y but cannot be traced back
to them."""
return x + y
 
N

ncf

Steve said:
encrypt them to create a new number that cann't be traced back to the
originals.

Here's one:
def encrypt(x, y):
"""Return a number that combines x and y but cannot be traced back
to them."""
return x + y

Or you can use sha1 so you can't do basic checks to find out. :)
It seems to me like he's trying to do some DH like thing, so yea, he
might rather a hash

**** UNTESTED ****

import sha1
def encrypt(x,y):
''' Return a number that combines x and y but cannot be traced back
to them. Number returned is in xrange(2**24). '''
def _dosha(v): return sha1.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:11],16)
 
J

James Stroud

This is either a very simple or a very open-ended question you have asked. Do
you want to be able to recover the original numbers arbitrarily from the
combination? What properties do you want the combination to have? Do you want
to take the combination and a number and see if the number is in the
combination without revealing any other constituent numbers? Do you want to
be able to provide any arbitrary number of the combination and recover all or
some subset of the constituent numbers (depending on the supplied number)?

What do you want to do with the combination and the individual numbers?

James

Hi!

I was wondering if someone can recommend a good encryption algorithm
written in python. My goal is to combine two different numbers and
encrypt them to create a new number that cann't be traced back to the
originals.

It would be great if there exists a library already written to do this,
and if there is, can somebody please point me to it??

Thanks in advance,
J

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
J

jlocc

Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!
 
R

Robert Kern

Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!

Why don't you assign an arbitrary ID number to each student that is
entirely unrelated to sensitive information (except via the database
which is hopefully secure)?

--
Robert Kern
(e-mail address removed)

"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
 
P

Paul Rubin

Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!

Why do you want to include the birth date, given that the SSN will
already be unique? It won't be a big obstacle to brute forcing the
SSN out of a keyless hash, since knowing the student's year of
graduation will in most cases be enough to narrow his or her DOB down
to a few hundred possibilities.

How many digits can the student number have? What happens if two
different students get assigned the same number?

If you have a secure database where the actual DOB and SSN are held,
why not just have it issue a student ID number at the time the DOB/SSN
row is added?

I'm feeling that you're working in a subtle and tricky area without
really knowing what you're doing, and that people's privacy is at
risk. Most of the good answers to your question are going to begin
with "choose a random string K that you're able to keep secret through
the entire lifetime of the whole system". The security of your system
will rest on being able to keep K secret against determined attackers.
You then have a key management problem, which has to be handled
through careful procedures and possibly special hardware, not by an
algorithm.

Please get a copy of the book "Security Engineering", by Ross
Anderson, to get an idea of what you're getting into.
 
J

James Stroud

Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!

Then your best bet is to take a reasonable number of bits from an sha hash.
But you do not need pycrypto for this. The previous answer by "ncf" is good,
but use the standard library and take 9 digits to lessen probability for
clashes

import sha
def encrypt(x,y):
def _dosha(v): return sha.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)


Example:

py> encrypt(843921299,20050906)
522277004

Each student ID should be unique until you get a really big class. If your
class might grow to several million, consider taking more bits of the hash.

Also, as long as you remember the function, you can get back the student ID
from the birthday and SS, in case they drop out and re-enroll next year.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
J

James Stroud

Also, I should note that the sha function will, to the limits of anyone's
ability to analyze it, decouple the information from the hash. So, to be
careful, you should keep the algorithm to generate the IDs secret. The
advantage of creating an ID from info in this way is that the ID is ("should
be") unique and unchanging. The disadvantage is that you have to keep the
algorithm secret. Because by knowing it, people could generate IDs from
birthdays with only 10**10 calculations (possible 9 digit SS numbers) and
match them to the IDs. All they need to do this is to ask someone what their
birthday is and try SS#s until they get the corresponding ID.

You could keep the algorithm encrypted and decrypt it temporarily to generate
a new ID. Or, you could memorize it and type it in at the beginning of the
semester and generate the IDs for that semester. You might also have to do
this if you loose the IDs somehow.

But beware of the "rubber hose cryptanalyitic attack". This is where an
adversary beats you with a rubber hose then asks you for the ID generation
algorithm (or key to the encrypted version). They then check your algorithm
against a known birthday-SS#-ID triplet. If you lied, they repeat until they
verify your algorithm. This has historically been a very successful attack.

James

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/
 
P

Paul Rubin

James Stroud said:
Then your best bet is to take a reasonable number of bits from an sha hash.
But you do not need pycrypto for this. The previous answer by "ncf" is good,
but use the standard library and take 9 digits to lessen probability for
clashes

import sha
def encrypt(x,y):
def _dosha(v): return sha.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)
...
Each student ID should be unique until you get a really big class. If your
class might grow to several million, consider taking more bits of the hash.

Please don't give advice like this unless you know what you're doing.
You're taking 8 hex digits and turning them into an integer. That
means you'll probably have a collision after around 65,000 id's, not
several million. "Probably" means > 50%. You'll have a significant
chance (say more than 1%) of collision after maybe 10,000.

Also, if you know the student's graduation year, in most cases there
are just a few hundred likely birthdates for that student, so by brute
force search you can crunch the output of your function to a fairly
small number of DOB/SSN combinations.

The only approach that makes sense is for the secure database to
assign arbitrary numbers that aren't algorithmically related to any
sensitive data. Answers involving encryption will need to use either
large ID numbers or secret keys, both of which will cause hassles.
 
S

Steve Holden

Paul said:
James Stroud said:
Then your best bet is to take a reasonable number of bits from an sha hash.
But you do not need pycrypto for this. The previous answer by "ncf" is good,
but use the standard library and take 9 digits to lessen probability for
clashes

import sha
def encrypt(x,y):
def _dosha(v): return sha.new(str(v)).hexdigest()
return int(_dosha(_dosha(x)+_dosha(y))[5:13],16)
...
Each student ID should be unique until you get a really big class. If your
class might grow to several million, consider taking more bits of the hash.


Please don't give advice like this unless you know what you're doing.
You're taking 8 hex digits and turning them into an integer. That
means you'll probably have a collision after around 65,000 id's, not
several million. "Probably" means > 50%. You'll have a significant
chance (say more than 1%) of collision after maybe 10,000.

Also, if you know the student's graduation year, in most cases there
are just a few hundred likely birthdates for that student, so by brute
force search you can crunch the output of your function to a fairly
small number of DOB/SSN combinations.

The only approach that makes sense is for the secure database to
assign arbitrary numbers that aren't algorithmically related to any
sensitive data. Answers involving encryption will need to use either
large ID numbers or secret keys, both of which will cause hassles.

This is indubitably true. There's absolutely no excuse for making the
primary key a function of the data that record contains, as doing so
will assist any cryptanalytical attacks.

regards
Steve
 
S

Steven D'Aprano

Also, I should note that the sha function will, to the limits of anyone's
ability to analyze it, decouple the information from the hash. So, to be
careful, you should keep the algorithm to generate the IDs secret.

Security by obscurity is very little security at all. If there is any
motive at all to reverse-engineer the algorithm, people will reverse
engineer the algorithm. Keeping a weak algorithm secret does not make it
strong.
 
S

Steven D'Aprano

Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!


There are "one-way" encryption functions where the result can't easily be
traced back to the input, but why do you need the input anyway? Here is my
quick-and-dirty student ID algorithm:

last_number_used = 123 # or some other appropriate value

def make_studentID():
global last_number_used
last_number_used = last_number_used + 1
return last_number_used

For a real application, I'd check the database to see if the number has
already been used before returning the number. Also, if you need more
than four digits in your IDs, I'd add a checksum to the end so you can
detect many typos and avoid much embarrassment.

Since the ID is entirely random (a factor of what order the students are
entered into the database) no attacker can regenerate their SSN from their
student ID. At worst, an attacker might be able to work out roughly what
day they were added to the database. Big deal. And if that is a problem,
you might do something like this:

last_number_used = 12345
usable_IDs = []

def make_studentID():
global last_number_used
global usable_IDs
if not usable_IDs:
# generate another batch of IDs in random order
usable_IDs = range(last_number_used, last_number_used + 1000)
usable_IDs.sort(random.random())
last_number_used += 1000
return usable_IDs.pop()

In a real application you would need to store the global variables in a
database, otherwise each time you reload the Python script you start
generating the same IDs over and over again.
 
M

Mike Meyer

Steven D'Aprano said:
Basically I will like to combine a social security number (9 digits)
and a birth date (8 digits, could be padded to be 9) and obtain a new
'student number'. It would be better if the original numbers can't be
traced back, they will be kept in a database anyways. Hope this is a
bit more specific, thanks!!!
last_number_used = 123 # or some other appropriate value

def make_studentID():
global last_number_used
last_number_used = last_number_used + 1
return last_number_used

For a real application, I'd check the database to see if the number has
already been used before returning the number. Also, if you need more
than four digits in your IDs, I'd add a checksum to the end so you can
detect many typos and avoid much embarrassment. [...]
In a real application you would need to store the global variables in a
database, otherwise each time you reload the Python script you start
generating the same IDs over and over again.

For real applications (ignoring your theoretical need to generate the
numbers in a random order) I'd not only store the number in the
database - I'd let the databae generate it. Most have some form of
counter that does exactly what you want without needing to keep track
of it and check the database for consistency.

<mike
 
K

Kirk Job Sluder

Steven D'Aprano said:
There are "one-way" encryption functions where the result can't easily be
traced back to the input, but why do you need the input anyway?

Well, there is a form of security design that involves one-way
encryption of confidential information. You might want to be able to
search on SSN, but not have the actual SSN stored in the database. So,
you are prepared to deal with the inevetable, "I lost my
password/student ID, can you still look up my records?"

Don't think it applies in this case, but might in some other cases.
 
P

Paul Rubin

Kirk Job Sluder said:
Well, there is a form of security design that involves one-way
encryption of confidential information. You might want to be able to
search on SSN, but not have the actual SSN stored in the database. So,
you are prepared to deal with the inevetable, "I lost my
password/student ID, can you still look up my records?"

The minute you provide a way to do that without secret keys, you have
a security hole. SSN's are 9 digits which means there are 1 billion
of them. If there are 100,000 hashed SSN's in the database, the
attacker (since this is clpy) can read them all into a Python dict.
S/he then starts generating SSN's at random and hashing them and
checking whether those hashes appear in the dict. Doing stuff like
iterated hashes to slow the attacker down doesn't help that much: the
attacker needs to hash only 10,000 or so SSN's to be likely to hit one
that's in the dict. If the attacker can hash all 10**9 SSN's, which
isn't all that terribly many, every SSN in the database spills.

Bottom line: to keep confidential stuff secure, you need actual security.
 
K

Kirk Job Sluder

Paul Rubin said:
The minute you provide a way to do that without secret keys, you have
a security hole.

Providing any kind of access to data involves creating a security hole.
This is the biggest flaw in most discussions of computer security. Too
much of it depends on everyone remembering (and using) unique
cryptographically strong keys.

You have a client on the phone who needs access to information, but has
forgotten or lost the 10-digit unique ID and the PIN you gave them two
years ago. How do you provide that client with the information he or
she needs? This is the kind of dilemma that one-way encryption is
designed to make a tiny bit safer.

SSNs + some other secret (such as mother's maiden name) is certainly
crappy security. However, I don't think we are going to see widespread
adoption of anything better in the near future.

But even if we go with "more secure" authentication tokens, there is usually
no reason to store the authentication token in plaintext.
SSN's are 9 digits which means there are 1 billion
of them. If there are 100,000 hashed SSN's in the database, the
attacker (since this is clpy) can read them all into a Python dict.
S/he then starts generating SSN's at random and hashing them and
checking whether those hashes appear in the dict. Doing stuff like
iterated hashes to slow the attacker down doesn't help that much: the
attacker needs to hash only 10,000 or so SSN's to be likely to hit one
that's in the dict. If the attacker can hash all 10**9 SSN's, which
isn't all that terribly many, every SSN in the database spills.

Of course, an additional step I didn't mention was that in actual
practice the SSNs would be hashed with a strong random secret key. But
from my point of view, the possibility for dictionary attacks is pretty
much unavoidable as long as we are dealing just with memorized tokens.

We've been bitching, whining and moaning about the small keyspace and
poor quality of what users are willing to memorize for 20 years. We can
complain about it for the next 10 which is about how long it will take
for any kind of alternative to be adopted. I still think that one-way
hashing of authentication "secrets" is better than plain-text storage.
Bottom line: to keep confidential stuff secure, you need actual security.

The only way to keep confidential stuff secure is to shred it, burn it,
and grind the ashes.

I think the fundamental problem is that that most customers don't want
actual security. They want to be able to get their information by
calling a phone number and saying a few words/phrases they memorized in
childhood. Given the current market, it seems to be cheaper to deal
with breaks after the fact than to expect more from customers.
 
P

Paul Rubin

Kirk Job Sluder said:
You have a client on the phone who needs access to information, but has
forgotten or lost the 10-digit unique ID and the PIN you gave them two
years ago. How do you provide that client with the information he or
she needs? This is the kind of dilemma that one-way encryption is
designed to make a tiny bit safer.

You need secret keys then, and you need to secure them. If you have a
secure secret key K, you can store something like HMAC(K, SSN) and
that is pretty safe from offline attacks.
Of course, an additional step I didn't mention was that in actual
practice the SSNs would be hashed with a strong random secret key.

But now you have to maintain that secret key and its secrecy, which is
not a trivial task. It's not an unsolveable problem but you can't
handwave it.

We're told there is already a secure database in the picture
somewhere, or at least one that unescapeably contains cleartext SSN's,
so that's the system that should assign the ID numbers and handle
SSN-based queries.
I think the fundamental problem is that that most customers don't
want actual security. They want to be able to get their information
by calling a phone number and saying a few words/phrases they
memorized in childhood.

A voice exemplar stored at enrollment time plus a question or two like
"what classes did you take last term" could easily give a pretty good
clue that the person saying the words/phrases is the legitimate
student.
Given the current market, it seems to be
cheaper to deal with breaks after the fact than to expect more from
customers.

Customers legitimately want actual security without having to care how
hash functions work, just like they want safe transportation without
having to care about how jet engine turbopumps work. Air travel is
pretty safe because if the airline fails to maintain the turbopumps
and a plane goes down, there is hell to pay. There is huge legal and
financial incentive for travel vendors (airlines) to not cut corners
with airplane safety. But vendors who deploy incompetently designed
IT systems full of confidential data resulting in massive privacy
breaches face no liability at all.

There is no financial incentive for them to do it right, so they
instead spend the money on more marketing or on executive massages or
whatever, and supply lousy security. THAT is the fundamental problem.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Encryption algorithm 1
Python and PHP encryption/decryption 3
python+encryption 1
Authenticated encryption with PyCrypto 6
Encryption 3
How do I solidify my Python skills 1
Help with an algorythm 5
AES encryption 5

Members online

No members online now.

Forum statistics

Threads
474,264
Messages
2,571,315
Members
48,001
Latest member
Wesley9486

Latest Threads

Top