Which is more Pythonic? (was: Detecting Binary content in files)

John Posner · Apr 1, 2009

Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

mrkrs = [b for b in block
if b > 127
or b in [ "\r", "\n", "\t" ] ]

mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

(Note: Dennis's statement converts a string into a list; mine does not.)

---

binary = (float(len(mrkrs)) / len(block)) > 0.30

binary_alt = 1.0 * len(mrkrs) / len(block) > 0.30

-John

E-mail message checked by Spyware Doctor (6.0.0.386)
Database version: 5.12090
http://www.pctools.com/en/spyware-doctor-antivirus/

bieffe62 · Apr 1, 2009

Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

mrkrs = [b for b in block
if b > 127
or b in [ "\r", "\n", "\t" ] ]

mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

Never tested my 'pythonicity', but I would do:

def test(b) : b > 127 or b in r"\r\n\t"
mrkrs = filter( test, block )

Note: before starting to study haskell, I would probably have used the
list comprehension. Still can't stand anonimous functions though.

(Note: Dennis's statement converts a string into a list; mine does not.)

---

binary = (float(len(mrkrs)) / len(block)) > 0.30

binary_alt = 1.0 * len(mrkrs) / len(block) > 0.30

I believe now one should do (at least on new code):

from __future__ import division # not needed for python 3.0
binary = ( len( mrks) / len (blocks) ) > 3.0

In the past, I often used the * 1.0 trick, but nevertheless believe
that it is better
using explicit cast.

-John

Ciao

John Posner · Apr 1, 2009

mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"),
block)
Oops! Clearly,

b in "\r\n\t"

.... is preferable to ...

b in list("\r\n\t")

You do *not* want to use a raw string here:
6

E-mail message checked by Spyware Doctor (6.0.0.386)
Database version: 5.12090
http://www.pctools.com/en/spyware-doctor-antivirus/

John Machin · Apr 1, 2009

Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

mrkrs = [b for b in block
if b > 127
or b in [ "\r", "\n", "\t" ] ]

I'd worry about "correct" before "Pythonic" ... see my responses to
Dennis in the original thread.

mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

Try this on and see if it fits:

num_bin_chars = sum(b > "\x7f" or b < "\x20" and b not in "\r\n\t" for
b in block)

(Note: Dennis's statement converts a string into a list; mine does not.)

What is list("\r\n\t") doing, if it's not (needlessly) converting a
string into a list?

---

binary = (float(len(mrkrs)) / len(block)) > 0.30

binary_alt = 1.0 * len(mrkrs) / len(block) > 0.30

num_bin_chars > 0.30 * len(block)

(no mucking about with float() or 1.0, and it doesn't blow up on a
zero-length block)

Cheers,
John

Terry Reedy · Apr 2, 2009

John said:
Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

mrkrs = [b for b in block
if b > 127
or b in [ "\r", "\n", "\t" ] ]

Click to expand...

I'd worry about "correct" before "Pythonic" ... see my responses to
Dennis in the original thread.

mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

Click to expand...

Comprehensions combine map and filter and somewhat, to some people,
replace both. Tastes vary.

If one has a filter function f already, filter(f,seq) may be faster than
(f(i) for i in seq). If one does not, (<expression involving i> for i
in seq) will probably be faster than filter(lambda i: <expression
imvolving i>, seq) as it avoids a function call, using inlined
expression code.

So either can be more Pythonic, depending on the context.

Try this on and see if it fits:

num_bin_chars = sum(b > "\x7f" or b < "\x20" and b not in "\r\n\t" for
b in block)

However, for just counting, this is even better -- and most Pythonic!
In fact, being able to count the number of True values in a stream of
True and False by summation is part of the justification of bool being a
subclass of int.

What is list("\r\n\t") doing, if it's not (needlessly) converting a
string into a list?

num_bin_chars > 0.30 * len(block)

(no mucking about with float() or 1.0, and it doesn't blow up on a
zero-length block)

Nice point!

Terry Jan Reedy

Generators/iterators, Pythonicity, and primes	7	Apr 4, 2009
what is lambda used for in real code?	26	Dec 31, 2004
How bad is $'? (Was: "Get substring of line")	4	Jan 18, 2005
Have the functionality of INTERCAL's operators (and more!) in Ruby - today	0	Apr 13, 2004
one infinite leap and you thought the bible code was somthing!!	0	Feb 1, 2005
server-side JavaScript: Prototypes of built-in classes, objects and functins	0	Jun 28, 2008
ANN: 'rex', a module for easy creation and use of regular expressions	0	Jun 10, 2004
Stuff the purple heart programmers cook up	10	Dec 30, 2004

Which is more Pythonic? (was: Detecting Binary content in files)

John Posner

bieffe62

John Posner

John Machin

Terry Reedy

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads