Which is more Pythonic? (was: Detecting Binary content in files)

J

John Posner

Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

mrkrs = [b for b in block
if b > 127
or b in [ "\r", "\n", "\t" ] ]

mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)


(Note: Dennis's statement converts a string into a list; mine does not.)

---

binary = (float(len(mrkrs)) / len(block)) > 0.30

binary_alt = 1.0 * len(mrkrs) / len(block) > 0.30

-John





E-mail message checked by Spyware Doctor (6.0.0.386)
Database version: 5.12090
http://www.pctools.com/en/spyware-doctor-antivirus/
 
B

bieffe62

Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

   mrkrs = [b for b in block
     if b > 127
       or b in [ "\r", "\n", "\t" ]       ]

   mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
   mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

Never tested my 'pythonicity', but I would do:

def test(b) : b > 127 or b in r"\r\n\t"
mrkrs = filter( test, block )

Note: before starting to study haskell, I would probably have used the
list comprehension. Still can't stand anonimous functions though.


(Note: Dennis's statement converts a string into a list; mine does not.)

---

   binary = (float(len(mrkrs)) / len(block)) > 0.30

   binary_alt = 1.0 * len(mrkrs) / len(block) > 0.30

I believe now one should do (at least on new code):

from __future__ import division # not needed for python 3.0
binary = ( len( mrks) / len (blocks) ) > 3.0

In the past, I often used the * 1.0 trick, but nevertheless believe
that it is better
using explicit cast.


Ciao
 
J

John Machin

Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

   mrkrs = [b for b in block
     if b > 127
       or b in [ "\r", "\n", "\t" ]       ]

I'd worry about "correct" before "Pythonic" ... see my responses to
Dennis in the original thread.
   mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
   mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

Try this on and see if it fits:

num_bin_chars = sum(b > "\x7f" or b < "\x20" and b not in "\r\n\t" for
b in block)
(Note: Dennis's statement converts a string into a list; mine does not.)

What is list("\r\n\t") doing, if it's not (needlessly) converting a
string into a list?
---

   binary = (float(len(mrkrs)) / len(block)) > 0.30

   binary_alt = 1.0 * len(mrkrs) / len(block) > 0.30

num_bin_chars > 0.30 * len(block)

(no mucking about with float() or 1.0, and it doesn't blow up on a
zero-length block)

Cheers,
John
 
T

Terry Reedy

John said:
Dennis Lee Bieber presented a code snippet with two consecutive statements
that made me think, "I'd code this differently". So just for fun ... is
Dennis's original statement or my "_alt" statement more idiomatically
Pythonic? Are there even more Pythonic alternative codings?

mrkrs = [b for b in block
if b > 127
or b in [ "\r", "\n", "\t" ] ]

I'd worry about "correct" before "Pythonic" ... see my responses to
Dennis in the original thread.
mrkrs_alt1 = filter(lambda b: b > 127 or b in [ "\r", "\n", "\t" ],
block)
mrkrs_alt2 = filter(lambda b: b > 127 or b in list("\r\n\t"), block)

Comprehensions combine map and filter and somewhat, to some people,
replace both. Tastes vary.

If one has a filter function f already, filter(f,seq) may be faster than
(f(i) for i in seq). If one does not, (<expression involving i> for i
in seq) will probably be faster than filter(lambda i: <expression
imvolving i>, seq) as it avoids a function call, using inlined
expression code.

So either can be more Pythonic, depending on the context.
Try this on and see if it fits:

num_bin_chars = sum(b > "\x7f" or b < "\x20" and b not in "\r\n\t" for
b in block)

However, for just counting, this is even better -- and most Pythonic!
In fact, being able to count the number of True values in a stream of
True and False by summation is part of the justification of bool being a
subclass of int.
What is list("\r\n\t") doing, if it's not (needlessly) converting a
string into a list?


num_bin_chars > 0.30 * len(block)

(no mucking about with float() or 1.0, and it doesn't blow up on a
zero-length block)

Nice point!

Terry Jan Reedy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,833
Latest member
BettyeMacf

Latest Threads

Top