MD5 module Pythonicity

L

Leandro Lameiro

Hi folks

Recently I have been discussing with a friend about python ease of
use, and it is really good at this. This friend needed to calculate
the MD5 hash of some files and was telling me about the MD5 module.
The way he told me and how it is described in the Python Docs, the
method to calculate hashes did not seemed very pythonic to me, but it
was certainly very simple and easy:

The method is (taken from python official documentation):
'\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9'

The idea to use this for files is: open file, take little chunks of
the file, call update for each one, and when you are done reading the
file, call digest. Well, OK, it is very simples and easy.
But wouldn't it be more pythonic if it did exist some kind of
md5.calculate_from_file("file") ?!
This way, you wouldn't have to split the file by yourself (this
function proposed would do this for you etc) and would make code a lot
more readable:

or something like this. (Maybe passing to the md5 calculate_from_file
the open file object, instead of the string)

One alternative also shown in the documentation is to do everything at once:

Well, OK, this one is a bit more readable (it is not as good as I
think it could be), but has the disadvantage of having to load the
WHOLE file to memory.

What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?
It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)

"Although practicality beats purity."
"Readability counts."
"Beautiful is better than ugly."

Have I got the wrong "Pythonic" definition?
 
M

Mike Meyer

Leandro Lameiro said:
What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?

Nothing in particular; it's just a trivial thing to write. If you add
every usefull utility function to the standard library, you wind up
with a multi-thousand page library documentation. The line has to be
drawn somewhere.
It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)

Wanting to sum a file at the command line isn't that uncommon, so a
utility makes some sense. But how often does a program need the md5
sum of a file? Especially compared to how often it wants to take the
md5 sum of some string that isn't in a file?

It might be worth adding. Except that md5 isn't really trusted
anymore; you really want to be using sha1 (and you presumably have an
sha1sum utility. FreeBSD has md5 and sha1 commands). But the sha
module doesn't have a file handler either.

You might try posting a patch to sourceforge, and see if it gets
accepted.

<mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,736
Latest member
AdolphBig6

Latest Threads

Top