max(), sum(), next()

B

bearophileHUGS

castironpi:
For max and min, why can't you just add your argument to the set
itself?

Sometimes that can be done, but in many other situations it's less
easy, like in the example I have shown in my first post:

max((fun(x) for x in iterable if predicate(x)))

There are some ways to add the max there, for example using an
itertools.chain to chan the default value to the end of the iterable,
but most of the time I just write a for loop.

Bye,
bearophile
 
M

Mensanator

You can't conclude the behaviour of the one from the behaviour
of the other, because the two situations have nothing at all in
common.

I wouldn't say they have nothing in common. After all, neither []
nor [None,None] contain any integers, yet summing the first gives
us an integer result whereas summing the second does not. Yes, for
different unrelated reasons, but sometimes reasons aren't as important
as results.
I'm not following that. Are you saying a query that returns no
records doesn't have a specific field containg a Null so there
are no Nulls to poison the sum? ...tap...tap...tap. Ok, I can see
that,
Exactly.

but you don't get 0 either.

That's because the SQL sum() has a special case for "no rows
returned".  A *different* special case than the one that taint's
the sum when encountering a NULL.  It does the equivalent of

    if len(rows_returned) == 0:
        # Special case for no rows returned
        return NULL
    total = 0
    for row in rows_returned:
        value = row[column]
        if value is NULL:
            # Special case for encountering a NULL value
            return NULL
        total += value
    return total

Two different special cases for the two different situations.  If
you were to remove the special case for no rows returned, you
would get zero when the SELECT statement finds no rows, but the
sum would still be tainted when a NULL value is encountered..

The definition of sum in mathematics *does* do away with that
special case.  The sum of zero terms is zero.  And the Python
sum() function follows the mathematics definition in this
respect, not the SQL definition.

Too bad. I brought this up because I use Python a lot with
database work and rarely for proving theorms in ZFC.

Guess I have to work around it. Just one more 'gotcha' to
keep track of.
You can argue that Python sum() should have special cased the
empty sequence.  

I did.
It's not an illogical stance to take.  

I didn't think so.
It's just
a totally different issue from encountering a non-numeric element
in the sequence.  

Ok, I was paying more attention to the outcome.
In some cases it might actually make sense to
treat the empty sequence as an error, but just ignore non-numeric
elements (i.e, treat them as if they were zero).  

Ouch. Sounds like Excel and we don't want to go there.
 
M

Manu Hack

David C. Ullrich:


What do you think about my idea of adding that 'default' argument to
the max()/min() functions?

Bye,
bearophile

For max and min, why can't you just add your argument to the set
itself?

The reason max([]) is undefined is that max( S ) is in S.

It makes sense.
The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.

It doesn't make sense to me. What do you set x to?
 
K

Ken Starks

David said:
Mensanator said:
(e-mail address removed) wrote:
Empty Python lists [] don't know the type of the items it will
contain, so this sounds strange:
sum([])
0
Because that [] may be an empty sequence of someobject:
You are right in that sum could be used to sum arbitrary objects.
However, in 99.99% of the cases, you will be summing numerical values.
When adding real numbers, the neutral element is zero. ( X + 0 = X) It
is very logical to return zero for empty sequences.
No it isn't. Nothing is not 0, check with MS-Access, for instance:

Null + 1 returns Null. Any arithmetic expression involving a
Null evaluates to Null. Adding something to an unknown returns
an unknown, as it should.

It is a logical fallacy to equate unknown with 0.

Which has nothing to do with the "right" value for an
empty sum. If they hear about what you said here in
sci.math they're gonna kick you out - what do you
imagine the universally accepted value of \sum_{j=1}^0
is?

For example, the water table elevation in ft above Mean Sea Level
is WTE = TopOfCasing - DepthToWater.

TopOfCasing is usually known and constant (until resurveyed).
But DepthToWater may or may not exist for a given event (well
may be covered with fire ants, for example).

Now, if you equate Null with 0, then the WTE calculation says
the water table elevation is flush with the top of the well,
falsely implying that the site is underwater.

And, since this particular site is on the Mississippi River,
it sometimes IS underwater, but this is NEVER determined by
water table elevations, which, due to the CORRECT treatment
of Nulls by Access, never returns FALSE calculations.
0

is a bug, just as it's a bug in Excel to evaluate blank cells
as 0. It should return None or throw an exception like sum([None,1])
does.
Same way, if we would have a prod() function, it should return one for
empty sequences because X*1 = X. The neutral element for this operation
is one.

Of course this is not good for summing other types of objects. But how
clumsy would it be to use

sum( L +[0] )

or

if L:
value = sum(L)
else:
value = 0

instead of sum(L).

Once again, this is what sum() is used for in most cases, so this
behavior is the "expected" one.

Another argument to convince you: the sum() function in SQL for empty
row sets returns zero in most relational databases.

But of course it could have been implemented in a different way... I
believe that there have been excessive discussions about this decision,
and the current implementation is very good, if not the best.

Best,

Laszlo

I suppose the following is accepted by statisticians. Here,
for reference, here is the what the 'R' statistic package
says on the subject 9if you type 'help(sum)'


<quote>
Sum of Vector Elements
Description
sum returns the sum of all the values present in its arguments.

Usage
sum(..., na.rm = FALSE)

Arguments
.... numeric or complex or logical vectors.
na.rm logical. Should missing values be removed?

Details
This is a generic function: methods can be defined for it directly or
via the Summary group generic. For this to work properly, the arguments
.... should be unnamed, and dispatch is on the first argument.

If na.rm is FALSE an NA value in any of the arguments will cause a value
of NA to be returned, otherwise NA values are ignored.

Logical true values are regarded as one, false values as zero. For
historical reasons, NULL is accepted and treated as if it were integer(0).

Value
The sum. If all of ... are of type integer or logical, then the sum is
integer, and in that case the result will be NA (with a warning) if
integer overflow occurs. Otherwise it is a length-one numeric or complex
vector.
NB: the sum of an empty set is zero, by definition.

References
Becker, R. A., Chambers, J. M. and Wilks, A. R. (1988) The New S
Language. Wadsworth & Brooks/Cole.
 
K

Ken Starks

David said:
I don't see why you feel the two should act the same.
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.

And both for good reason:

(i) If A and B are disjoint sets we certainly want to
have sum(A union B) = sum(A) + sum(B). This requires
sum(empty set) = 0.

(ii) If A is a subset of B then we should have
max(A) <= max(B). This requires that max(empty set)
be something that's smaller than everything else.
So we give up on that.

Do we give up? Really ?

From wikipedia: http://en.wikipedia.org/wiki/Empty_set
(Uses wikipedia's LaTeX notation -- I hope those interested
are OK with that )

<quote>
Mathematics

[edit] Extended real numbers

Since the empty set has no members, when it is considered as a subset of
any ordered set, then any member of that set will be an upper bound and
lower bound for the empty set. For example, when considered as a subset
of the real numbers, with its usual ordering, represented by the real
number line, every real number is both an upper and lower bound for the
empty set.[3] When considered as a subset of the extended reals formed
by adding two "numbers" or "points" to the real numbers, namely negative
infinity, denoted -\infty\!\,, which is defined to be less than every
other extended real number, and positive infinity, denoted +\infty\!\,,
which is defined to be greater than every other extended real number, then:

\sup\varnothing=\min(\{-\infty, +\infty \} \cup \mathbb{R})=-\infty,

and

\inf\varnothing=\max(\{-\infty, +\infty \} \cup \mathbb{R})=+\infty.

That is, the least upper bound (sup or supremum) of the empty set is
negative infinity, while the greatest lower bound (inf or infimum) is
positive infinity. By analogy with the above, in the domain of the
extended reals, negative infinity is the identity element for the
maximum and supremum operators, while positive infinity is the identity
element for minimum and infimum.
 
B

bearophileHUGS

David C. Ullrich:
I didn't mention what's below because it doesn't seem
likely that saying max([]) = -infinity and
min([]) = +infinity is going to make the OP happy...

Well, it sounds cute having Neginfinite and Infinite as built-int
objects that can be compared to any other type and are < of or > of
everything else but themselves. Probably they can be useful as
sentinels, but in Python I nearly never use sentinels anymore, and
they can probably give some other problems...

Bye,
bearophile
 
M

Mensanator

        As a by-stander... let the DBMS do its work, don't try to make
Python do what DBMS SQL does...

Sure, and in most cases I use Visual Basic for Applications
when I need functionality I can't get directly from SQL.

But anybody who's used VBA with Access must know what a PITA
it is. And even when you get it working, you sometimes wish you
hadn't. I have a Mann-Kendall trend analysis that must be done
quarterly on over 150 combinations of well:analyte. It takes
over 6 hours to process this (and I don't know how much is due to
VBA, Access, server, network, etc.). It's something I'd love to
try in Python (if I can find the time to translate it).

But I'm wary of things that Python might do (such as return 0
when summing an empty list) that SQL/VBA does not.
 
K

Ken Starks

David said:
Ken Starks said:
Do we give up? Really ?

Erm, thanks. I was aware of all that below. If we're
being technical what's below is talking about the sup
and inf, which are not the same as max and min. More
relevant to the present context, I didn't mention what's
below because it doesn't seem likely that saying max([])
= -infinity and min([]) = +infinity is going to make the
OP happy...

Of course you were aware, I have seen enough of your posts
to know that. And I agree that, whatever Wikipedia seems to
imply, max and supremum should be distiguished.

It was your prelude, "At least in mathematics ..." that
made me prick up my ears. So I couldn't resist responding,
without _any_ malice I assure you.

Cheers,
Ken.
 
S

Steven D'Aprano

Think about all the previously elected female or black presidents of the
US. Which one was the tallest?

I know the answer to that one:

All of them!
 
M

Manu Hack

On Sep 4, 2:42 pm, (e-mail address removed) wrote:
David C. Ullrich:
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.
What do you think about my idea of adding that 'default' argument to
the max()/min() functions?

For max and min, why can't you just add your argument to the set
itself?
The reason max([]) is undefined is that max( S ) is in S.

It makes sense.
The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.

It doesn't make sense to me. What do you set x to?

For all x.

But then how can you conclude sum([]) = 0 from there? It's way far
from obvious.
 
S

Steven D'Aprano

The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.

It doesn't make sense to me. What do you set x to?

For all x.

But then how can you conclude sum([]) = 0 from there? It's way far from
obvious.

I think Castironpi's reasoning is to imagine taking sum([x])-x for *any*
possible x (where subtraction and addition is defined). Naturally you
always get 0.

Now replace x by *nothing at all* and you get:

sum([]) "subtract nothing at all" = 0

I think that this is a reasonable way to *informally* think about the
question, but it's not mathematically sound, because if you replace x
with "nothing at all" you either get:

sum([]) - = 0

which is invalid (only one operand to the subtraction operator), or you
get:

sum([0]) - 0 = 0

which doesn't involve an empty list. What castironpi seems to be doing is
replacing "nothing at all" with, er, nothing at all in one place, and
zero in the other. And that's what makes it unsound and only suitable as
an informal argument.

[The rest of this is (mostly) aimed at Mensanator, so others can stop
reading if they like.]

Fundamentally, the abstract function "sum" and the concrete Python
implementation of sum() are both human constructs. It's not like there is
some pure Platonic[1] "Ideal Sum" floating in space that we can refer to.
Somewhere, sometime, some mathematician had to *define* sum(), and other
mathematicians had to agree to use the same definition.

They could have decided that sum must take at least two arguments,
because addition requires two arguments and it's meaningless to talk
about adding a single number without talking about adding it to something
else. But they didn't. Similarly, they might have decided that sum must
take at least one argument, and therefore prohibit sum([]), but they
didn't: it's more useful for sum of the empty list to give zero than it
is for it to be an error. As I mentioned earlier, mathematicians are
nothing if not pragmatists.






[1] Or was it Aristotle who believed in Ideal Forms? No, I'm sure it was
Plato.
 
M

Manu Hack

The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.

It doesn't make sense to me. What do you set x to?

For all x.

But then how can you conclude sum([]) = 0 from there? It's way far from
obvious.

I think Castironpi's reasoning is to imagine taking sum([x])-x for *any*
possible x (where subtraction and addition is defined). Naturally you
always get 0.

Now replace x by *nothing at all* and you get:

sum([]) "subtract nothing at all" = 0

I think that this is a reasonable way to *informally* think about the
question, but it's not mathematically sound, because if you replace x
with "nothing at all" you either get:

sum([]) - = 0

which is invalid (only one operand to the subtraction operator), or you
get:

sum([0]) - 0 = 0

which doesn't involve an empty list. What castironpi seems to be doing is
replacing "nothing at all" with, er, nothing at all in one place, and
zero in the other. And that's what makes it unsound and only suitable as
an informal argument.

Actually it's even more natural to state sum([x]) = x, and this way
you can never conclude that sum([]) = 0 from there.
 
C

castironpi

On Sep 4, 2:42 pm, (e-mail address removed) wrote:
David C. Ullrich:
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.
What do you think about my idea of adding that 'default' argument to
the max()/min() functions?
Bye,
bearophile
For max and min, why can't you just add your argument to the set
itself?
The reason max([]) is undefined is that max( S ) is in S.
It makes sense.
The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.
It doesn't make sense to me.  What do you set x to?
For all x.

But then how can you conclude sum([]) = 0 from there?  It's way far
from obvious.

You can define sum([a1,a2,...,aN]) recursively as
sum([a1,a2,...a(N-1)])+aN. Call the sum sum([a1,a2,...,aN]) "X", then
subtract aN.

sum([a1,a2,...a(N-1)])+aN=X
sum([a1,a2,...a(N-1)])+aN-aN=X-aN

For N=2, we have:

sum([a1,a2])=X
sum([a1,a2])-a2=X-a2
sum([a1,a2])-a2-a1=X-a2-a1

Since X= a1+ a2, replace X.

sum([a1,a2])-a2-a1=(a1+a2)-a2-a1

Or,

sum([a1,a2])-a2-a1=0

Apply the recursive definition:

sum([a1])+a2-a2-a1=0

And again:

sum([])+a1+a2-a2-a1=0

And we have:

sum([])=0.
 
S

Steven D'Aprano

Actually it's even more natural to state sum([x]) = x, and this way you
can never conclude that sum([]) = 0 from there.

But what you can say is that for any list L, sum(L) = sum(L + [0]).

Therefore sum([]) = sum([] +[0]) = 0
 
M

Manu Hack

On Sep 4, 2:42 pm, (e-mail address removed) wrote:
David C. Ullrich:
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.
What do you think about my idea of adding that 'default' argument to
the max()/min() functions?

For max and min, why can't you just add your argument to the set
itself?
The reason max([]) is undefined is that max( S ) is in S.
It makes sense.
The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.
It doesn't make sense to me. What do you set x to?
For all x.

But then how can you conclude sum([]) = 0 from there? It's way far
from obvious.

You can define sum([a1,a2,...,aN]) recursively as
sum([a1,a2,...a(N-1)])+aN. Call the sum sum([a1,a2,...,aN]) "X", then
subtract aN.

sum([a1,a2,...a(N-1)])+aN=X
sum([a1,a2,...a(N-1)])+aN-aN=X-aN

For N=2, we have:

sum([a1,a2])=X
sum([a1,a2])-a2=X-a2
sum([a1,a2])-a2-a1=X-a2-a1

Since X= a1+ a2, replace X.

sum([a1,a2])-a2-a1=(a1+a2)-a2-a1

Or,

sum([a1,a2])-a2-a1=0

Apply the recursive definition:

sum([a1])+a2-a2-a1=0

And again:

sum([])+a1+a2-a2-a1=0

And we have:

sum([])=0.

It makes more sense now, I just wanted to point out that only with
sum([x]) = x, you can't get sum([]) = 0.
 
K

Ken Starks

castironpi said:
On Sep 4, 2:42 pm, (e-mail address removed) wrote:
David C. Ullrich:
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.
What do you think about my idea of adding that 'default' argument to
the max()/min() functions?
Bye,
bearophile
For max and min, why can't you just add your argument to the set
itself?
The reason max([]) is undefined is that max( S ) is in S.
It makes sense.
The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.
It doesn't make sense to me. What do you set x to?
For all x.
But then how can you conclude sum([]) = 0 from there? It's way far
from obvious.

You can define sum([a1,a2,...,aN]) recursively as
sum([a1,a2,...a(N-1)])+aN. Call the sum sum([a1,a2,...,aN]) "X", then
subtract aN.

sum([a1,a2,...a(N-1)])+aN=X
sum([a1,a2,...a(N-1)])+aN-aN=X-aN

For N=2, we have:

sum([a1,a2])=X
sum([a1,a2])-a2=X-a2
sum([a1,a2])-a2-a1=X-a2-a1

Since X= a1+ a2, replace X.

sum([a1,a2])-a2-a1=(a1+a2)-a2-a1

Or,

sum([a1,a2])-a2-a1=0

Apply the recursive definition:

sum([a1])+a2-a2-a1=0

And again:

sum([])+a1+a2-a2-a1=0

And we have:

sum([])=0.

This is not necessarily so.

The flaw is that you provide a recursive definition with no start value,
which is to say it is not a recursive definition at all.

A recursive definition should be (for lists where elements
can be added, and ignoring pythonic negative indexing):

Define 'sum(L)' by
a. sum(L[0]) = L[0]
b. sum(L[0:i]) = sum(L[0:i-1]) + L ... if i > 0

From this you can prove the reverse recursion
sum{L[0:k]) = sum(L[0:k+1]) - L[k+1]
__only__ if k >= 0

It says nothing about the empty list.

You could add, as part of the definition, that sum{[]) = 0, or any other
value.

A rather different approach, not quite simple recursion, would be to
start with

A. a slicing axiom, something like:

for all non-negative integers, a,b,c with a <=b <= c:

sum(L[a:c]) = sum(L[a:b]) + sum(L[b:c])

B. a singleton axiom:

for all integers a where L[a] exists:
sum(L[a:a]) = L[a]




2a. sum{
 
K

Ken Starks

castironpi said:
On Sep 4, 2:42 pm, (e-mail address removed) wrote:
David C. Ullrich:
At least in mathematics, the sum of the elements of
the empty set _is_ 0, while the maximum element of the
empty set is undefined.
What do you think about my idea of adding that 'default' argument to
the max()/min() functions?
Bye,
bearophile
For max and min, why can't you just add your argument to the set
itself?
The reason max([]) is undefined is that max( S ) is in S.
It makes sense.
The reason sum([]) is 0 is that sum( [ x ] ) - x = 0.
It doesn't make sense to me. What do you set x to?
For all x.
But then how can you conclude sum([]) = 0 from there? It's way far
from obvious.

You can define sum([a1,a2,...,aN]) recursively as
sum([a1,a2,...a(N-1)])+aN. Call the sum sum([a1,a2,...,aN]) "X", then
subtract aN.

sum([a1,a2,...a(N-1)])+aN=X
sum([a1,a2,...a(N-1)])+aN-aN=X-aN

For N=2, we have:

sum([a1,a2])=X
sum([a1,a2])-a2=X-a2
sum([a1,a2])-a2-a1=X-a2-a1

Since X= a1+ a2, replace X.

sum([a1,a2])-a2-a1=(a1+a2)-a2-a1

Or,

sum([a1,a2])-a2-a1=0

Apply the recursive definition:

sum([a1])+a2-a2-a1=0

And again:

sum([])+a1+a2-a2-a1=0

And we have:

sum([])=0.

This is not necessarily so.

The flaw is that you provide a recursive definition with no start value,
which is to say it is not a recursive definition at all.

A recursive definition should be (for lists where elements
can be added, and ignoring pythonic negative indexing):

Define 'sum(L)' by
a. sum(L[0:1]) = L[0]
b. sum(L[0:i]) = sum(L[0:i-1]) + L ... if i > 1

From this you can prove the reverse recursion
sum{L[0:k]) = sum(L[0:k+1]) - L[k+1]
__only__ if k >= 0

It says nothing about the empty list.

You could add, as part of the definition, that sum{[]) = 0, or any other
value.

A rather different approach, not quite simple recursion, would be to
start with

A. a slicing axiom, something like:

for all non-negative integers, a,b,c with a <=b <= c:

sum(L[a:c]) = sum(L[a:b]) + sum(L[b:c])

B. a singleton axiom:

for all integers a where L[a] exists:
sum(L[a:a]) = L[a]




2a. sum{
 
M

Mel

Steven said:
Actually it's even more natural to state sum([x]) = x, and this way you
can never conclude that sum([]) = 0 from there.

But what you can say is that for any list L, sum(L) = sum(L + [0]).

Therefore sum([]) = sum([] +[0]) = 0

Yep. The way it is preserves the distributive property

sum(a+b) = sum(a) + sum(b)

This would matter in cases like (untested code..)

suvsales = sum (sum (s.price for s in d.sales if s.class='suv') for d in
districts)


Mel.
 
M

Mensanator

[The rest of this is (mostly) aimed at Mensanator,

Ok, I see where you're coming from.
Fundamentally, the abstract function "sum" and the concrete Python
implementation of sum() are both human constructs. It's not like there is
some pure Platonic[1] "Ideal Sum" floating in space that we can refer to.
Somewhere, sometime, some mathematician had to *define* sum(), and other
mathematicians had to agree to use the same definition.

They could have decided that sum must take at least two arguments,
because addition requires two arguments and it's meaningless to talk
about adding a single number without talking about adding it to something
else. But they didn't.

Ok. But the problem is they DID in SQL: x + Null = Null.

Earlier, you said that an empty box contains 0 widgets.
Fine, empty means 0. But Null doesn't mean empty. Say
your widget supplier just delivers a box and you haven't
opened it yet. Is the box likely to be empty? Probably
not, or they wouldn't have shipped it. In this case,
Null means "unknown", not 0. The number of widgets you
have on hand is Null (unknown) because inventory + Null = Null.

SQL will correctly tell you that the amount on hand is unknown,
whereas Python will tell you the amount on hand is inventory,
which is incorrect.
Similarly, they might have decided that sum must
take at least one argument, and therefore prohibit sum([]), but they
didn't: it's more useful for sum of the empty list to give zero than it
is for it to be an error. As I mentioned earlier, mathematicians are
nothing if not pragmatists.

Here's a real world example (no ivory tower stuff):

An oil refinery client has just excavated a big pile of
dirt to lay a new pipeline. Due to the volume of the
pipe, there's dirt left over. Ideally, the client
would like to use that dirt as landfill (free), but it
must be tested for HAPS (by summing the concentrations of
organic constituents) to see whether it is considered
hazardous waste, it which cas it must be taken off site
and incinerated (costly).

In MOST cases, a HAPS sum of 0 would be illegal because
0's generally cannot be reported in analytical tests,
you can't report a result less than it's legal reporting
limit. If ALL the consituents were undetected, the sum
should be that of the sum of the reporting limits, thus,
it cannot be 0.

Can't I just use a sum of 0 to tell me when data is missing?
No, because in some cases the reporting limit of undetected
compounds is set to 0.

In which case, a 0 HAPS score means we can confidently
reccomend that the dirt is clean and can be freely reused.

But if the analysis information is missing (hasn'r arrived
yet or still pending validation) we WANT the result to be
UNKNOWN so that we don't reccomend to the client that he take
an illegal course of action.

In this case, SQL does the correct thing and Python would
return a false result.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,999
Messages
2,570,243
Members
46,836
Latest member
login dogas

Latest Threads

Top