MD5 module Pythonicity

Leandro Lameiro · Oct 15, 2005

Hi folks

Recently I have been discussing with a friend about python ease of
use, and it is really good at this. This friend needed to calculate
the MD5 hash of some files and was telling me about the MD5 module.
The way he told me and how it is described in the Python Docs, the
method to calculate hashes did not seemed very pythonic to me, but it
was certainly very simple and easy:

The method is (taken from python official documentation):
'\xbbd\x9c\x83\xdd\x1e\xa5\xc9\xd9\xde\xc9\xa1\x8d\xf0\xff\xe9'

The idea to use this for files is: open file, take little chunks of
the file, call update for each one, and when you are done reading the
file, call digest. Well, OK, it is very simples and easy.
But wouldn't it be more pythonic if it did exist some kind of
md5.calculate_from_file("file") ?!
This way, you wouldn't have to split the file by yourself (this
function proposed would do this for you etc) and would make code a lot
more readable:

or something like this. (Maybe passing to the md5 calculate_from_file
the open file object, instead of the string)

One alternative also shown in the documentation is to do everything at once:

Well, OK, this one is a bit more readable (it is not as good as I
think it could be), but has the disadvantage of having to load the
WHOLE file to memory.

What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?
It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)

"Although practicality beats purity."
"Readability counts."
"Beautiful is better than ugly."

Have I got the wrong "Pythonic" definition?

Mike Meyer · Oct 15, 2005

Leandro Lameiro said:
What's wrong in having a function like the one I said, that would
split files for you, feed md5.update and, when it is over, return the
digest?

Nothing in particular; it's just a trivial thing to write. If you add
every usefull utility function to the standard library, you wind up
with a multi-thousand page library documentation. The line has to be
drawn somewhere.

It is easier, doesn't require MD5 objects creation, works well on
small and big files, makes the code more readable and simple. Also,
calculating MD5 of files seems to be a common enough task to be put in
the library (well, at least on GNU/Linux we have one command just for
this - md5sum)

Wanting to sum a file at the command line isn't that uncommon, so a
utility makes some sense. But how often does a program need the md5
sum of a file? Especially compared to how often it wants to take the
md5 sum of some string that isn't in a file?

It might be worth adding. Except that md5 isn't really trusted
anymore; you really want to be using sha1 (and you presumably have an
sha1sum utility. FreeBSD has md5 and sha1 commands). But the sha
module doesn't have a file handler either.

You might try posting a patch to sourceforge, and see if it gets
accepted.

<mike

MD5 Help Page	1	Oct 8, 2005
ERROR:root:code for hash md5 was not found	15	Jan 11, 2012
DBD::SQLite install with PPM, MD5 bootstrap parameter	2	Dec 5, 2011
os.walk/list	3	Mar 20, 2011
DeprecationWarning on md5	2	Nov 19, 2009
encrypting lines from file with md5 module doesn't work?	3	Feb 14, 2009
Translater + module + tkinter	1	Feb 16, 2023
Output confusion	2	Mar 9, 2023

MD5 module Pythonicity

Leandro Lameiro

Mike Meyer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads