Anybody who can claim (without a smilie) that "sum is redundant with
reduce" has clearly lost all touch with reality. "sum" accounts for
about 80-90% of the uses of reduce found on large existing codebases,
and even more when only _good_ uses of reduce are considered. "sum"
is way faster than "reduce(operator.add, ..." -- and the advantage in
speed *AND* simplicity grows even more when one considers how often
that typical reduce is miscoded as "reduce(lambda x, y: x+y, ..." or
"reduce(int.__add__, ..." and the like.
"Add up this bunch of numbers" is a VERY frequent task in many kinds of
programs. reduce LOOKS like a decent solution to that task, but it just
*ISN'T*. Consider the simple, obvious, elementary solution:
[alex@lancelot test]$ timeit.py -c -s'import operator' 'x=0' 'for i in
xrange(999): x+=i'
1000 loops, best of 3: 290 usec per loop
versus progressively-fancier reduce-based ones:
[alex@lancelot test]$ timeit.py -c -s'import operator' 'reduce(operator.add,
xrange(999))'
1000 loops, best of 3: 300 usec per loop
[alex@lancelot test]$ timeit.py -c -s'import operator' 'reduce(int.__add__,
xrange(999))'
1000 loops, best of 3: 540 usec per loop
[alex@lancelot test]$ timeit.py -c -s'import operator' 'reduce(lambda x,y:
x+y, xrange(999))'
1000 loops, best of 3: 660 usec per loop
The complications of going to a higher-order function and learning about
module operator "pay off" in a SLOWDOWN compared to elementary code...!
The "fancy" ways reduce is often coded slow things down further by about
two times. This is clearly unacceptable. And the right solution is,
quite obviously:
[alex@lancelot test]$ timeit.py -c 'sum(xrange(999))'
10000 loops, best of 3: 129 usec per loop
giving speed, simplicity, and code that is clean, high-level, concise, and
immediately obvious to any reader or maintainer.
The paladins of reduce don't help their case by posting reduce "use
cases" that are complicated and rarely optimal:
In 2.4, the much simpler expression:
list.sorted(seq, key=len)[-1]
is about 3 times faster on a typical long seq (seq=map(str, xrange(2000)).
We still don't have (in the current pre-alpha 2.4) the "one obvious way",
max(seq, key=len), because max & other functions haven't yet been upgraded
to take the optional parameter key= as lists' sort and sorted methods, but
I trust they will be.
And if one doesn't care for the concision of these new idioms, the most
elementary programming is still best:
[alex@lancelot test]$ timeit.py -c -s'xs=map(str,xrange(2000))' -s'''
def longest(seq):
seq=iter(seq)
best = seq.next()
blen = len(best)
for item in seq:
ilen = len(item)
if ilen>blen:
best = item
blen = ilen
return best
''' 'longest(xs)'
1000 loops, best of 3: 980 usec per loop
versus:
alex@lancelot test]$ timeit.py -c -s'xs=map(str,xrange(2000))' -s'''
def longer(x, y):
if len(y) > len(x): return y
else: return x
def longest(seq):
return reduce(longer, seq)
''' 'longest(xs)'
100 loops, best of 3: 2.4e+03 usec per loop
Once again, reduce proves to be an "attractive nuisance": it "looks" kind
of cool, but in fact it "repays" attention and study by _slowing things
down_ even in comparison to simple, elementary programming. Higher order
programming, such as the "max(seq, key=len)" which I hope will make it
into 2.4, is of course way simpler, faster, and more concise -- but the
"neither fish nor fowl" level at which most uses of reduce fall just
doesn't cut it, IMHO.
And btw, if one wants concision rather than speed, even in 2.3, the
typical DSU-like idiom:
def longest(seq):
aux = [ (len(x), -i, x) for i, x in enumerate(seq) ]
return max(aux)[-1]
still handily beats the reduce-coded alternative (even though
having the len= parameter instead of explicit decoration will
of course make it even smoother, simpler, and faster, in 2.4).
Oh, guys...
def ireduce(func, seq):
seq = iter(seq)
try:
i = seq.next()
while True:
i = func(i, seq.next())
except StopIteration:
return i
I have some trouble with this snippet's indentation (have you used
tabs...?), but, apart from that, this is surely a close equivalent
to reduce (it differs from reduce when seq is empty, raising an
UnboundLocalError rather than a TypeError, etc, but such trifles
could of course be easily adjusted). It IS, however, marginally
slower than reduce. I clock 'reduce(lambda x, y: y, xs)' at 1110
millisecs, but 'ireduce(lambda x, y: y, xs)' at 1270 millisecs,
when "xs=map(str, xrange(2000))". [[ Now, of course, reduce's fans
will claim that this 15%-or-so slowdown destroys ireduce's usefulness,
while at the same time claiming that the many-times speedups versus
reduce that I have repeatedly shown for many other idioms, both lower
and higher level, don't matter -- just watch it, it'll be fun
]]
def cond(boolean, iftrue, iffalse):
if boolean: return iftrue
else: return iffalse
heh -- trying to sneak in a ternary, are we?-)
peace a chance!".split(' '))
'peace'
Unfortunately, what this suggests to me is that this ireduce (not
surprisingly, as it has basically reduce's interface
and cond (ditto)
encourage obscure and best-avoided idioms.
For the task "get the first item in seq that has a length of 5", both
the simplest, lowest-level coding:
def firstlen5(seq):
for item in seq:
if len(item) == 5: return item
else: raise ValueError, "No item has length 5"
or the concise, higher-level coding
[ x for x in seq if len(x)==5 ] [0]
or better in 2.4
( x for x in seq if len(x)==5 ).next()
appear far preferable to me. I suspect they'd also be way faster, but I'm
getting a bit tired of posting oodles of measurements that are just going
to get ignored by the frantic fans of reduce, since if they got off their
high horse and just looked at reality they'd be left with no case;-).
Can't we all just....get along?
Sure we can -- unless you count 'reduce' itself as a part of 'us'
.
Alex