reduce() anomaly?

  • Thread starter Stephen C. Waterbury
  • Start date
A

Alex Martelli

Douglas Alan wrote:
...
Well, that's the argument you seem to be making -- that reduce() is
superfluous because a sum() and max() that work on sequences were
added to the language.

"added"? 'max' worked that way since well before I met Python. But
to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

According to Alex Martelli, max() and min() are likely to be extended
in this fashion. Why not sum() next?

I was just presenting my personal opinion regarding the fact that
the max and min functions should have an optional key= argument,
simply because the .sort method now (2.4) has one too. If I implied
that anything I feel should be done is automatically "likely" to be
accepted by Guido, I apologize for mis-communicating; he throws stones
at far more of my ideas than he embraces.

Well, perhaps you can explain your confusion to me? What could
possibly be unintuitive about a function that is just like sum(), yet
it allows you to specify the addition operation that you want to use?

Addition, as you have remarked elsewhere, is intrinsically suitable
for being applied to multiple operands; one is therefore naturally
disposed to think in terms of "add them all up", "sum them all", and
the like, NOT in terms of "iteratively perform
total = operator.add(total, item)
for all items in the sequence of numbers". But, reduce as it exists
in Python today cannot be clearly defined EXCEPT by such an iteration
with some generic "callable which accepts two arguments" specified
in lieu of the operator.add. Given this total generality, even though
hardly ever does it make sense to USE it, reduce becomes quite complex.

You've even argued in this thread, incorrectly, that reduce could be
eliminated if only all binary operators were able to accept arbitrary
numbers of arguments. This, plus your insistence that 'reduce' is
"just like" APL's / (which does NOT accept arbitrary functions on its
left -- just specific operators such as +), indicate a _misconception_
of reduce on your part. I'm sure you don't hold it consciously, but
these miscommunications indicate that even you, reduce's paladin, do
NOT properly grasp reduce "intuitively".

Having (e.g.) add accept a sequence argument (like max does), or, for
explicitness and potentially performance, grow an add.reduce attribute
just like in Numeric, would give no particular problems. I'd still
want (just like Numeric does) to have sum as a name for add.reduce,
because it's by far the most common case and avoids getting into the
issue of what the (expletive delete) does "reducing" have to do with
anything that "add.reduce" does (it's really a curve ball compared
with the many meanings of "reduce" in English, after all). But, most
importantly, such a design would avoid the veritable traps to which
the current, too-general 'reduce' subjects the poor learner who's
quite reasonably going to believe that all that generality MUST be
good for something if it's been designed into a built-in. We've seen
quite a few such inappropriate uses on this thread, after all.

As I said, I'm not particularly good at guessing what Guido will or
won't bless, but I suspect that a patch to the operator module to
grow appropriate attributes on its functions, or even just make them
accept multiple arguments, might be well-received (to avoid wasting
work one might want a PEP or prePEP about that first, of course).

Of course, you can get the same effect by defining a class with your
own special __add__ operator, and then encapsulating all the objects
you want to add in this new class, and then using sum(), but that's a
rather high overhead way to accomplish the same thing that reduce()
lets you do very easily.

In practice, I judge it quite unlikely that people will go out of their
way to abuse sum as I've seen them abuse reduce -- reduce lets you
abuse it very easily, as you mention, while sum needs you to really
jump through hoops to do so.


Alex
 
D

Dang Griffith

I would say that if you didn't get introduced to at least the concept
of functional programming and a hint of how it works, then something
was seriously wrong with your CS education.
|>oug

I've been lurking this thread for a while, but missed the beginning.
Can I ask where you got your CS education (school and year)?
From the conversation, it seems that right or wrong, your education
exceeds those in this group. Where should I send my children
such that they receive a better CS education that I?
--dang
 
T

Terry Reedy

Douglas Alan said:
Yes, but they are still two *different* ways to to get to that result.
Starting with a and adding b to it, is not the same thing as starting
with b and adding a to it.

I disagree. Number addition (like several other functions) is a bag
(multiset) reduction operator that reduces a (possibly empty)
collection of similar items to one of the same type. Abstractly, the
reduction is simultaneous. One appropriate notation for such
operators is a simple labelled loop (in 2D) containing the items to be
collected together. In linear text, such loops can be represented by
matched symbols, such as ( ), [ ], { }, which then represent the
intersection of a closed loop with a 'too-narrow' text band (which
thereby chops off the top and bottom of the loop). When operands
scattered in the loop interior are projected down to a 1+D band for
text representation, they gain an ordering that is an artifact of that
limited representation.

(+ item item ...) and sum(item, item, ...) are both labelled loops
with items arbitrarily but necessarily ordered for linear text
representation.
It is only the commutative law of arithmetic,
as any good second grade student can tell you,
that guarantees that the result will be the same.

Well, second graders are mislead by the representation of an n-ary bag
operation with infix notation that implies binarity and order
relevance. Learning about reduce should help one unlearn these
implications ;-).

Even with infix notation, there is a choice between making the default
interpretation be order relevance or order irrelavance. If
mathematicians had made the second choice, therr would be no need for
a 'commutative law' One would say that a+b == b+a because that is the
way it normally is, while a-b != b-a because of the OrderRelevance Law
of subtraction.

< On the other hand, not all mathematical groups are albelian,
< and consequently, a + b != b + a for all mathematical groups.

Yes, there are order-relevant binary-only operations. Number addition
is not one of them.

[Having written the above, I am less enthralled with the overloading
of '+' and operator.add to also mean ordered concatenation of
sequences. The fact that operator.add becomes O(n**2) (usually
unnecessarily) instead of O(n) for such overloadings, to the point
that sum() specially excludes such overloadings for strings,
reinforces this doubt. Perhaps there should be at least be an
operator.cat that would generalize ''.join(strseq)]

Terry J. Reedy
 
A

Alex Martelli

Douglas said:
Your claim is silly. sum() is not *way* simpler than reduce(), and
anyone can be explained reduce() in 10 seconds: "reduce() is just like

As you hedged elsewhere, anyone can be EXPLAINED it, but not anyone
will UNDERSTAND the explanation instantly. Now THAT hedging is way
silly, and of course Shakespeare did it best:

"""
Glendower: I can call spirits from the vasty deep.

Hotspur: Why, so can I, or so can any man; But will they come when you do
call for them?
"""
sum(), only with reduce() you can specify whatever addition function
you would like."

This explanation is misleading by overqualification (that "addition" is
the overqualification). A non-overqualified explanation would have to
mention that ANY callable that can accept two arguments, and return a
result suitable to be its first argument next, can be passed; and it would
have to touch on what happens when the sequence is empty, and the optional
third argument. Yes, you COULD cram it into 10 seconds -- people won't
UNDERSTAND what you're saying, of course, but clearly you don't care
about such trifles, as you have abundantly shown.

The claim, made by somebody else, that _every_ CS 101 course teaches the
functionality of 'reduce' is not just false, but utterly absurd:
'reduce', 'foldl', and similar higher-order functions, were not taught to
me back when _I_ took my first university exam in CS [it used Fortran as
te main language],

Then you weren't taught Computer Science -- you were taught Fortran
programming. Computer Science teaches general concepts, not specific
languages.

CS teaches general concepts and in all decent CS 101 course it starts
doing so with concrete examples (because THAT is how most people learn
best), ideally with an executable computer language so you can TRY things.
It is quite likely that higher-order functions won't be taught yet in
CS 101, or that only a very few specific cases will be shown and reduce
(or foldr or however you want to call it) will not be one of them.

Your false and utterly absurd claim about the content of _every_ CS 101
course has been totally demolished by now -- by many people, including
CS majors. I'm sure you'd be happy to "undo" it, but you've invested
so much emotional energy on it and made it a lynchpin of your argument
in favour of 'reduce' that I understand that's just too hard. Still,
by standing on such rotten foundations, your argument is reduced (heh)
to near-absurdity...

they were not taught to my son in _his_ equivalent course [it used
Pascal], and are not going to be taught to my daughter in _her_
equivalent course [it uses C].

Then your children were done a great diservice by receiving a poor
education. (Assuming that is that they wanted to learn Computer
Science, and not Programming in Pascal or Programming in C.)

They wanted to learn enough computer science to get on with their
real interests in life -- telecom engineering and financial economics
respectively. They may well take second or third courses on CS issues
in the future. But within the confines of a Bachelor (3-years degree)
in such disciplines, heavily different from computer science but able to
use several CS conceptual and practical tools effectively for other
purposes, I think they were served perfectly well by "Foundations
of Informatics" (aka CS 101 -- we don't number things like that here,
and "Informatics" is strongly preferred over "CS", and in particular
it's the preference of local computer scientists) covering a lot of
important concepts AND practical exercises, but not ALL higher-order
functions. Personally I think that they wasted effort in teaching
about pointers, but I guess that's debatable.

Whoever said anything about "maximizing generality"? If one's mantra

Your only argument FOR reduce is its maximal generality (and the falsehood
that it's taught in every CS 101 course).
elegant features understood by anyone who has studied Computer Science

If a CS 101 course (passed with top marks) counts as having "studied
Computer Science", then 'reduce' is not such a feature. It's not very
elegant either, btw, but that's a subjective opinion -- the fact that
CS 101 courses mostly do NOT teach reduce/foldl is a FACT. So, by
this reasoning of yours, but based on realities rather than falsehoods,
reduce should go away.
*utterly* wrong manner. You should be assuming that your audience are
the smart people that they are, rather than the idiots you are
assuming them to be.

They're smart _and pragmatical_, NOT interested in (very arguable...)
elegance _for its own sake_, for the most part. So, they ask for GOOD
use cases for reduce, if they're taught it -- and... none can be found
throughout the Standard Python library and everywhere else I've checked
into real programs (written by me or my clients, or downloaded from
the net). Every single use of reduce I've found would be faster,
cleaner, simpler with sum, an explicit loop, or other ways yet.

It is not true these principles are rare among computer languages --
they are quite common. Most such language (like most computer
languages in general) just never obtained any wide usage.

URLs please (or non-URL references if you must). I had never seen
these two principles written down for a general-purpose language
before I met C -- and haven't, since, except for Python. I'll be
quite interested in being able to quote other languages explicitly
adopting them as design criteria.

place at the right time, it had a lightweight implementation, was

The lightweight implementation clearly follows from the two
principles above -- C had very-lightweight, fast, portable
compilers early on for just the same reason.


Alex
 
E

Emile van Sebille

Alex Martelli:
... no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

Well, you must admit that writing pentultimate(a) has it all over
a.sorted()[-2] ;-)
 
D

Dang Griffith

I've been lurking this thread for a while, but missed the beginning.
Can I ask where you got your CS education (school and year)?
From the conversation, it seems that right or wrong, your education
exceeds those in this group. Where should I send my children
such that they receive a better CS education that I?
--dang
Nevermind--I saw more posts and summarily reached a conclusion,
reducing the alternatives to one.
--dang
 
T

Terry Reedy

Alex Martelli said:
A functional module would be neat.

Good. I wrongly thought you might be in the abolish camp.
A great way to enhance the chance that there will be one
would be starting one today (e.g. on sourceforge),
ideally with both pure-Python and C-helped (or pyrex, etc)
implementations,
and get it reasonably-widely used, debugged, optimized.

Interesting idea. Maybe I will do something with it.
Also, I advocate that 3.0 should have a module or package (call it
"legacy" for example) such that, if you started your program with
some statement such as:

from legacy import *

compatibility with Python 2.4 or thereabouts would be repaired as
much as feasible, to ease running legacy code, and to the expense
of performance, 'neatness' and all such other great things if needed
(e.g., no "repaired" versions or anything -- just compatibility).

Running old code in new interpreters seems not to be much of a problem
yet because of the current policy of not deleting features or (with
pretty minor or obscure exceptions) changing semantics. But if any
deprecations take effect before 3.0, legacy could also be added
before.

I think there should also be an official 'retroxy' module to help run
(a feasible subset of) pythonx.y code in older interpreters. A
retro23 module would define True, False, sum, enumerate and perhaps
other stuff. Accompanying instructions could discuss which older
versions a new module like sets will work with. I am sure this has
been invented privately several times.
One reasonably popular counterproposal to that was to have it as

from __past__ import *

by analogy with today's "from __future__".

which Guido has strongly rejected since it would mean keeping old
stuff in the interpreter forever -- unless it were a synonym for
'import legacy' -- in which case your spelling is better.
I'd also like to make it
easy to get this functionality with a commandline switch, like is
the case today with -Q specifically for _division_ legacy issues.

-L to mean 'import legacy' on startup should be possible.
But mostly, each time I mention that on python-dev, I'm rebuffed with
remarks about such issues being way premature today. Somebody's
proposed starting a new list specifically about 3.0, to make sure
remarks and suggestions for it made today are not lost about more
day-to-day python-dev traffic, but I don't think anything's been
done about that yet.

Perhaps Guido is trying to avoid pressure to start before he and the
PSF are ready.

I personally wish 3.0 were out now or the next release. That is what
I more or less expected 2+ years ago when I suggested that the
division switch be delayed until 3.0 -- or rather, suggested that the
first release after 2 years make the switch *and* be called 3.0. I
was not asking to have to import the new division rules for years and
years;-).

Terry
 
T

Thomas Heller

Alex Martelli said:
Terry Reedy wrote:
...

A functional module would be neat. A great way to enhance the chance
that there will be one would be starting one today (e.g. on sourceforge),
ideally with both pure-Python and C-helped (or pyrex, etc) implementations,
and get it reasonably-widely used, debugged, optimized.

I would suggest: don't move the implementation to C too fast or too much
of the code. With pypy on the horizon, this may never be needed (?).

Thomas
 
T

Terry Reedy

Alex Martelli said:
I'd rather put map/filter/reduce in a 'legacy' module -- warts and
all, e.g. map's horrible habit of None-padding shorter sequences,
EEK -- and have a designed-from-scratch functional module without
the warts.

What a liberating thought. I sometimes forget that the only thing I
am really stuck with when programming in Python is the core syntax (in
the Ref Manual) and that I am free to wrap or replace anything else
(in the Lib Manual). For a project I hope to start soon, I have been
worried about how to work around or explain away the deficiencies of
Python's of reduce(). Instead, I should just write the version I want
for my purposes (especially since speed does not matter). So your
voluminous writing on this has benefited at least one person.
Probably, yes. What functions, as opposed to types, do you
think should absolutely be in the builtins rather than in a separate
module, and (unless it's obvious) why?

I have thought some about this also. Type objects could be put/left
in types, but as constructors, they are too useful and basic not to
have immediately available (unless builtins was reduced to nearly
nothing).
Of course __import__ had better stay or we won't
be able to get anything imported, for example:).

Does it have to be in builtins for import statements to work?
Direct use of __import__ seems relatively rare.
The attribute-related functions hasattr, getattr,
setattr, and delattr, seem very fundamental
(but I'd want the same status for the item-functions
currently over in operator),
as well as
(at least some of them -- delattr isn't -- but I do think that trying to
discriminate too finely would make things too hard to learn)
very frequently used in code.

I had to re-lineate this to parse apart the three thoughts. A static
frequency-of-use analysis for some large code collection would be
interesting. I might write something eventually.
I group this with type constructor objects, even though not one
itself
unless made a method, keep in
pow [for the crucial 3-arguments case]
very specialized and rarely used? move to math as pow3
range (or some preferable variant that returns an iterator)
how about iter(3), etc, currently an arg type error?
think of 3 as the sentinal value.
is not typing '0,' worth the wartiness of optional first arg?
and zip seem pretty fundamental
agree except for pow

Terry J. Reedy
 
A

Andrew Dalke

Alex:
Anyway, computational scientists using Python should be using Numeric
(if they aren't, they're sadly misguided).

Or they are like me and work in a subfield where (specialized)
database searches and graph theory is more useful. It was strange
at SciPy with all those people using NumPy and doing CFD and
our little group of computational life sciences people being rather
bored.
Indeed, that's why 'sum' is a commonly used word in English -- exactly
because nobody's every DOING anything like that -- while, of course,
'reduce' is totally obvious -- why, _anybody_ who's ever taken
Chemistry 101 knows it means "to remove oxygen"!

While in Chemistry 102 they learn that it means to add electrons
and can occur in a non-oxygen environment ;)

http://reference.allrefer.com/encyclopedia/O/oxidreduc.html

Andrew
(e-mail address removed)
 
T

Terry Reedy

Alex Martelli said:
Dave Brueck wrote:
...
results = [ func(x) for x in sequence ]
... instead of ...
results = sequence.map(func) ??
Because I find the first much more readable>
I entirely agree with both points.

For this pair, I like the second better. Different aesthetics.
They're even clearer when the contrast is between, e.g.:
results = [ x+23 for x in sequence ]
and:
results = sequence.map(lambda x: x+23)
where using the HOF approach forces you
to understand (and read) lambda too.

Here I might take the first. 'lambda' is something I feed 'stuck'
with.

Would the hypothetical
results = sequence.map(func x: x+23)
be any better?
How about a very hypothetical (post ``==repr deprecation)
results = sequence..map(`x+23`)
?

Terry J. Reedy
 
T

Terry Reedy

Alex Martelli said:
to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

By my definition of "more general", a function that returns all order
statistics of a list is trivially more general than one that returns
any fewer, including just one. It conveys a superset of the
infomation conveyed by less general functions and answers a superset
of the questions they answer. If you add a control parameter
specifying which order statistics to return, then less general
functions have a restriced domain.

So I have no idea your intent in appearing to challenge this and how
you think it contributes to your overall argument. Is your idea of
'more general' so different?

Terry J. Reedy
 
A

Andrew Dalke

Alex Martelli:
"added"? 'max' worked that way since well before I met Python. But
to be consistent with your other arguments, no doubt you'd argue for
a .sort() followed by [-1] as "more general" than max...

Even better (tongue-in-cheek) would be a function to get the
t largest (or s largest to t largest) without necessarily comparing
between elements whose values are guaranteed out of range

Just looked through Knuth for that. There are various ways
listed, but no neat name to use for that function. :)
Look under 'select k largest' and 'select kth largest'. Most
of the hits are in the exercises. First one is for Hoare, and
there's a reference to Dodgson (Carroll) and lawn tennis.

Andrew
(e-mail address removed)
 
D

Douglas Alan

I've been lurking this thread for a while, but missed the beginning.
Can I ask where you got your CS education (school and year)? From
the conversation, it seems that right or wrong, your education
exceeds those in this group. Where should I send my children such
that they receive a better CS education that I?

I took some sort of one-day class at the public library on BASIC when
I was about 12. Then later in high school, I was handed a book on APL
by a friend's father. Then my mother sent me off to Harvard Summer
School, where I first took an official CS-101 class (and a class on
Astronomy), and then I went to MIT for undergraduate school, where I
also took their CS-101.

So, in a way, I had four different CS-101's, and in *all* of them I
was exposed to concepts of functional programming, except for the
BASIC case.

Despite me having studied at Harvard and MIT, it is my belief that you
don't have to have attended either of these places to receive a decent
CS education. You just have to be taught by people who don't think
that something as simple as reduce() is beyond your abilities. In
fact, perhaps it was in a small part because I was introduced to
concepts such as reduce() at a young age that I was able to become
smart enough to get into such a fine school. Would you deny such an
opportunity to those who learn to program in Python?

|>oug
 
D

Dave Brueck

You just have to be taught by people who don't think
that something as simple as reduce() is beyond your abilities.

Nobody in this entirely too-long thread has asserted such a thing. You have
inferred from people saying that it's non-obvious or less readable than other
constructs that they mean it's difficult to teach or that people are too stupid
to learn it. Nothing could be farther from the truth.

It's not that reduce is hard to learn - it's just that there are simpler and
clearer constructs that do the same thing and are more readable to most people.
That's it!
fact, perhaps it was in a small part because I was introduced to
concepts such as reduce() at a young age that I was able to become
smart enough to get into such a fine school. Would you deny such an
opportunity to those who learn to program in Python?

<retching noises> Ok, I'm done with this thread. Have fun,
Dave
 
D

Douglas Alan

Nobody in this entirely too-long thread has asserted such a thing.

There *have* been postings that have asserted that reduce() is
difficult to learn. But it is not. I learned it when I was in high
school with no trouble and no background. I've already posted a
perfectly good description of reduce() that anyone should be able to
learn in under a minute. And if despite this, someone finds reduce()
difficult to learn, then perhaps this is particularly the kind of
thing they that *should* be learning, so that such things will no
longer be difficult for them.
You have inferred from people saying that it's non-obvious or less
readable than other constructs that they mean it's difficult to
teach or that people are too stupid to learn it. Nothing could be
farther from the truth.

Neither is sum() obvious until you know what it does. Perhaps it is
it just a synonym for add? Or maybe it produces summaries of some
sort. What happens if you give it an empty list? In the time that
you can answer these questions for yourself, you can have figured out
the more general tool, reduce(), and once you have, then it is just as
obvious and readable as sum(), only it has other uses as well.
It's not that reduce is hard to learn - it's just that there are
simpler and clearer constructs that do the same thing and are more
readable to most people. That's it!

Well, bah! There are precious few constructs in this world that are
clearer and more readable than

reduce(add, seq)

|>oug
 
R

Rainer Deyke

Douglas said:
Q: How do I sum a sequence of numbers?

In three lines:

x = 0
for element in seq:
x += element

In two lines:
A: from operator import add
reduce(add, seq)

Unless seq is empty, in which case you need this:

from operator import add
reduce(add, seq, 0)

In one line:

x = sum(seq)


And that is why 'sum' is a worthwhile part of the Python standard library
and 'reduce' is not: using 'sum' is significantly shorter than using a for
loop, while using 'reduce' is not.
 
B

BW Glitch

Douglas said:
If there's a problem with people not understaning how to sum numbers
with reduce(), then the problem is with the documentation, not with
reduce() and the documentation should be fixed. It is quite easy to
make this fix. Here it is:

You are assuming way too much. Documentation is meant to clarify what a
function does, not a tutorial on what a function is. That's why your
CS101 professor should have taught you that the name of the function
should state what the function does. sum() states clearly that it is
meant to sum something (a connection is made with numbers, because
that's what most people associate numbers). reduce(), OTOH, states
nothing, it reduces. But what? Why should I care for a function that
isn't clear?
FAQ
---
Q: How do I sum a sequence of numbers?

A: from operator import add
reduce(add, seq)

Problem solved.

A: Guru.

Q: How a user that reads the manual is called?

;)
If a microbiologist cannot understand the above, they have no business
being a microbiologist.

Why should he? Because he could care less about a function called
reduce()? Because he just want to do his job, regardless on how a CS
expert does it? Because he didn't learned LISP as his first language?
I learned reduce() in high school, and it
didn't take me any longer than the 60 seconds I claim it will take
anyone with a modicum of intelligence.

Please, get your fact straights.

Do you know that people learn in different ways? Just because you
learned something one way doesn't mean everyone else should learn it
(and understand it) the same way you do.
If both the mantras cause a language to have general features
replaced with a larger number of specialized features that accomplish
less, then both mantras are bad.

Mantras are cute names for design decisions (I know, mantras doesn't
mean that, but in these context). When designing something, certain
compromises must be made. Guido van Rossum (GvR or BDFL) made some
decisions early on and have stayed with Python ever since. Good or bad,
we have to stick with the decisions. Why? Because each decision
represents a solution from many to a single problem. If we start making
exceptions to the decision, the decision should be rethought. But more
importantly, if it works, don't fix it.

Unless you go by the engineering mantra...
There are often multiple different ways to add together the same data
types, and you wouldn't want to have to define a new class for each
way of adding. For instance, you wouldn't want to have to define a
new integer class to support modular arithmetic -- you just want to
use a different addition operation.

Define a class then. What you are describing is a specific case which
most people don't face in everyday problems. If you have a set of tools,
try to use the best combination for the given tool. If all you have is a
hammer, everything start looking like a nail. ;) If you have to program
in Java, everything start looking like a class.
Because anyone who has studied CS, which should include a significant
percentage of programmers, will know instantly where to look for the
summing function if it is called reduce(), but they won't necessarily
know to look for sum(), since languages don't generally have a
function called sum(). And everyone else will not know to look for
either, so they might as well learn a more powerful concept in the
extra 30 seconds it will take them.

False. Programmers come from many sources. Restraining yourself to CS
people is bad enough. As Alex Martelli mentioned in another post of this
thread, power should be demonstrable. It's quite amusing that no one has
showed an example of the power of reduce() (by itself, not its side
effects).

[snip]
Did I say that?

You didn't. But I infere that from what you've said in this thread.
Yes, but they are still two *different* ways to to get to that result.
Starting with a and adding b to it, is not the same thing as starting
with b and adding a to it. It is only the commutative law of
arithmetic, as any good second grade student can tell you, that
guarantees that the result will be the same. On the other hand, not
all mathematical groups are albelian, and consequently, a + b != b + a
for all mathematical groups.

This might be the case IF we were talking about Matlab/Octave, which are
languages design towards mathematics.

I don't know what you are getting at about "optimization". Reduce()
exists for notational convenience--i.e., for certain tasks it is easer
to read, write, and understand code written using reduce() than it
would be for the corresponding loop--and understanding it is no more
difficult than understanding that a summing function might let you
specify the addition operation that you'd like to use, since that's
all that reduce() is!

Optimization was from somenone else in the thread whom I might
misunderstood.

It's still not obvious. You need to understand 1) whattf does reduce()
do and how, 2) what does the function passed does and 3) what the array
is about.
It was obvious to me when I was self-taught and I taught myself APL in
high-school. It also seemed obvious enough to all the physicists who
used APL at the lab where I was allowed to hang out to teach myself
APL.

Well, I learned TI-BASIC, C, C++ and then went on to learn Scheme,
Prolog and Python. Most of these during the last 6 years. Have I missed
something because I didn't know how to use reduce()? No. And more
importantly, what is a real life scenario where I should go with
reduce() instead of a loop? From what you've said, the only reason to
use reduce() is to show mere mortals that I know how to use reduce().
Besides that, Martelli have pointed that it provides no advantage what
so ever. And readability suffers because of the high abuse of this function.

--
Andres Rosado

-----BEGIN TF FAN CODE BLOCK-----
G+++ G1 G2+ BW++++ MW++ BM+ Rid+ Arm-- FR+ FW-
#3 D+ ADA N++ W OQP MUSH- BC- CN++ OM P75
-----END TF FAN CODE BLOCK-----

A plague upon all their houses!
-- Scourge (BW)
 
D

Douglas Alan

Alex Martelli said:
But to be consistent with your other arguments, no doubt you'd argue
for a .sort() followed by [-1] as "more general" than max...

That would have a larger big O time growth value -- most likely
O(n * log n) vs. O(n), for reasonable implemenations. And while I
wouldn't sweat a factor of 2 for a feature of a RAD or scripting
language, I would be more concerned about moving to a larger big O
value.
Addition, as you have remarked elsewhere, is intrinsically suitable
for being applied to multiple operands; one is therefore naturally
disposed to think in terms of "add them all up", "sum them all", and
the like, NOT in terms of "iteratively perform
total = operator.add(total, item)
for all items in the sequence of numbers". But, reduce as it exists
in Python today cannot be clearly defined EXCEPT by such an
iteration with some generic "callable which accepts two arguments"
specified in lieu of the operator.add. Given this total generality,
even though hardly ever does it make sense to USE it, reduce becomes
quite complex.

Your proposed extension to max() and min() has all the same problems.
But reasonable programmers don't abuse this generality, and so there
is little reason for the reasonable programmer to devote any
head-space to it. A programming language should be made to make life
as easy as possible for the reasonable programmer. Not to assume that
all programmers are latent unreasonable programmers and try to force
this urge to be stiffled. Don't take my word for it -- ask Paul
Graham. I believe he was even invited to give the Keynote Address at
a recent PyCon.

A tutorial for new programmers doesn't have to go into any of the
nuances of how one might abuse reduce() because that's not what
tutorials are for. It can merely say that reduce() is for applying a
binary operation such as "+" or "*" to a sequence of numbers, give a
couple examples on how to do it, and leave it at that. If the student
comes to a point where they want or need to know more details, they
can check the reference manual or experiment with it.
You've even argued in this thread, incorrectly, that reduce could be
eliminated if only all binary operators were able to accept arbitrary
numbers of arguments. This, plus your insistence that 'reduce' is
"just like" APL's / (which does NOT accept arbitrary functions on its
left -- just specific operators such as +), indicate a _misconception_
of reduce on your part. I'm sure you don't hold it consciously, but
these miscommunications indicate that even you, reduce's paladin, do
NOT properly grasp reduce "intuitively".

Just what is it that I don't grasp again? I think my position is
clear: I have no intention to abuse reduce(), so I don't worry myself
with ways in which I might be tempted to.
Having (e.g.) add accept a sequence argument (like max does), or, for
explicitness and potentially performance, grow an add.reduce attribute
just like in Numeric, would give no particular problems. I'd still
want (just like Numeric does) to have sum as a name for add.reduce,
because it's by far the most common case

So, now you *do* want multiple obviously right ways to do the same
thing?
and avoids getting into the issue of what the (expletive delete)
does "reducing" have to do with anything that "add.reduce" does
(it's really a curve ball compared with the many meanings of
"reduce" in English, after all).

The English world "reduce" certainly has multiple meanings, but so
does "sum". I can be a noun or a verb, for instance. It can mean
"summary" or "gist" in addition to addition. It also can be confusing
by appearing to be just a synonym for "add". Now people might have
trouble remember what the difference between sum() and add() is.

In Computer Science, however, "reduce" typically only has one meaning
when provided as a function in a language, and programmers might as
well learn that sooner than later.
But, most importantly, such a design would avoid the veritable traps
to which the current, too-general 'reduce' subjects the poor learner
who's quite reasonably going to believe that all that generality
MUST be good for something if it's been designed into a built-in.
We've seen quite a few such inappropriate uses on this thread, after
all.

That's very easy to fix:

FAQ
---
Q. Should I ever pass a function with side effects into reduce() or
map()?

A. No.

(Unless the side-effect is just printing out debugging information,
or saving away statistics, or something like that.)

|>oug
 
A

Andrew Dalke

Douglas Alan
Describing reduce() in 10 seconds is utterly trivial to anyone with an
IQ above 100, whether or not they have ever used sum():

"To add a sequence of numbers together:

reduce(add, seq)

To multiply a sequence of numbers together:

reduce(mul, seq) ...
Any two-argument function can be used in place of add, mul, sub, or
div and you'll get the appropriate result. Other interesting
examples are left as an exercise for the reader."

This isn't enough to describe what 'reduce' does. For
example, after reading this someone could think that
reduce(add, seq) is a macros which expands to
seq[0] + seq[1] + seq[2] + ...
and that reduce(mul, seq) is the same as
seq[0] * seq[1] * seq[2] + ...

For all the examples you gave, this is correct. However,
reduce(pow, seq) is not the same as
seq[0] ** seq[1] ** seq[2] + ...
because the order of operations is different.
import operator
reduce(operator.pow, [2, 3, 4]) 4096
2**3**4 2417851639229258349412352L
(2**3)**4 4096

I would argue that a "better" way to describe 'reduce'
is by defining it through its implementation, as

def reduce(f, seq):
val = seq[0]
for x in seq[1:]:
val = f(val, x)
return val

or by explaining it as
reduce(f, (a, b)) is the same as f(a, b)
reduce(f, (a, b, c)) is the same as f(f(a, b), c)
reduce(f, (a, b, c, d)) is the same as f(f(f(a, b), c), d)
and reduce(f, seq) is the same as
f(...f(f(f(f(seq[0], seq[1]), seq[2]), seq[3]), seq[4]), ....seq[n-1])

As I pointed out, any approach assumes the student knows
what it means to pass a function around. This is not
something that a beginning programmer (the proverbial CS
101 student) necessarily knows or is taught in that course.
(I know that you'll disagree with that statement.)

As others have pointed out, you must at that time explain
when/why 'reduce' is useful, otherwise it is not going to be
remembered, or at best put away on the mental top-shelf
only to be pulled out on rare occasion. For that matter, are
you prepared to answer the question 'why is it called "reduce"?' ?

You've already given an example of when it's useful. You
said that you use it all the time, as in

def longer(x, y):
if len(y) > len(x): return y
else: return x

def longest(seq):
return reduce(longer, seq)

It took me a while but I figured out why it works and
why it's clever. It must have taken so long because of
my substandard CS education. It appears though that
many others in the Python world have suffered from a
poor CS education because there are only six uses of
'reduce' in the top-level of the standard library, (And
several are rather confusing.)

Of those, cvs.py has a tricky way to write
sum( (x[1] for x in items) )
and difflib.py has a way to write
sum( (x[-1] for x in self.get_matching_blocks()) )
and profile.py could also be written as the simpler
def get_time_timer(timer=timer, sum=sum):
return sum(timer())
so 1/2 of those reduce cases are disguised 'sum's.

The other three are in csv.py and look something like
quotechar = reduce(lambda a, b, quotes = quotes:
(quotes[a] > quotes) and a or b,
quotes.keys())
and are identical in intent to your use case -- select the
object from a list which maximizes some f(obj).

This suggests the usefulness of a new function or two,
perhaps in itertools, which applies a function to a list
of values and returns the first object which has the
maximum value, as in

def imax(seq, f = lambda x: x):
seq = iter(seq)
obj = seq.next()
maxobj, maxval = obj, f(obj)
while 1:
try:
obj = seq.next()
except StopIteration:
return maxobj
val = f(obj)
if val > maxval:
maxobj, maxval = obj, val

and an equivalent imin. Note that this is "better" than
your code because f(x) is only computed once per
object or N times total, while you compute it 2*(N-1)
times. In some cases the f(x) may be the most
expensive term in the calculation.

Then your 'longest' could be implemented as

def longest(seq):
return imax(seq, len)

and the example code from csv can be rewritten as
the simpler (IMO, of course):

quotechar = imax(quotes.keys(),
lambda a, quotes = quotes: quotes[a])

I tried looking for places in the std. lib which are better
expressed as a 'reduce' but that was a hard search.
Instead, I looked for places which compute a sum by
using an explicit loop, to see if they are better written in
some alternate form. I looked for places which used the
words 'sum', 'total', or 'tot', as well as places where
there was a '= 0' followed by a '+' within the next few
lines. There were very few, and the paucity suggests
that 'sum' isn't needed all that often. Then again, I'm
not one who suggested that that be a builtin function ;)

In fact, I only found three places.

Here's the two most relevant, from tarfile.py:
chk = 256
for c in buf[:148]: chk += ord(c)
for c in buf[156:]: chk += ord(c)

I wondered what it would look like using some of
the (all too many -- what happened to "should only be
one obvious way"!) variants:

# Using 'reduce'
chk = 256
chk = reduce(lambda x, y: x + ord(y), buf[:148], chk)
chk = reduce(lambda x, y: x + ord(y), buf[156:], chk)

# using the new 'sum' and a map
chk = 256
chk += sum(map(ord, buf[:148])
chk += sum(map(ord, buf[156:])

# using sum and list comprehensions
chk = 256
chk += sum([ord(c) for c in buf[:148])
chk += sum([ord(c) for c in buf[156:])

# using sum and the proposed generator expression
chk = 256
chk += sum( (ord(c) for c in buf[:148]) )
chk += sum( (ord(c) for c in buf[156:]) )

Personally, I think the existing code is the clearest
of the bunch, and the functional forms are too complicated.

In summary:
- I disagree with you that your text is the clearest way
to explain the 'reduce' function, since it is open to an
alternate interpretation and it doesn't help the people who
learn by understanding how things are implemented either
in code or in math.

- The use case you gave suggests that an alternate function,
which I named 'imax' (and 'imin'), is more useful, in part
because it does fewer operations

- In any case, for Python code the 'reduce' function is very
rarely used, so should not be something frequently pulled
from a Pythonista's toolbox and not taught in an intro. CS
course based on Python.

Andrew
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,172
Messages
2,570,934
Members
47,477
Latest member
ColumbusMa

Latest Threads

Top