sum for sequences?

S

Steven D'Aprano

You're exceptionally good at (probably deliberately) mis-interpreting
what people write.

I cannot read your mind, I can only interpret the words you choose to
write. You said

See, I think the very existence of math.fsum() already violates "there
should be one obvious way to do it."
[end quote]


If sum satisfies the existence of one obvious way, how does math.fsum
violate it? sum exists, and is obvious, regardless of whatever other
solutions exist as well.
 
P

Patrick Maupin

You're exceptionally good at (probably deliberately) mis-interpreting
what people write.

I cannot read your mind, I can only interpret the words you choose to
write. You said

See, I think the very existence of math.fsum() already violates "there
should be one obvious way to do it."
[end quote]

If sum satisfies the existence of one obvious way, how does math.fsum
violate it? sum exists, and is obvious, regardless of whatever other
solutions exist as well.

Because sum() is the obvious way to sum floats; now the existence of
math.fsum() means there are TWO obvious ways to sum floats. Is that
really that hard to understand? How can you misconstrue this so badly
that you write something that can be (easily) interpreted to mean that
you think that I think that once math.fsum() exists, sum() doesn't
even exist any more????
 
S

Steve Howell

You don't define symmetry. You don't even give a sensible example of
symmetry. Consequently I reject your argument that because sum is the
obvious way to sum a lot of integers, "symmetry" implies that it should
be the obvious way to concatenate a lot of lists.

You are not rejecting my argument; you are rejecting an improper
paraphrase of my argument.

My argument was that repeated use of "+" is spelled "sum" for
integers, so it's natural to expect the same name for repeated use of
"+" on lists. Python already allows for this symmetry, just SLOWLY.
You are correct that building intermediate lists isn't *compulsory*,
there are alternatives, but the alternatives themselves have costs.
Complexity itself is a cost. sum currently has nice simple semantics,
which means you can reason about it: sum(sequence, start) is the same as

total = start
for item in sequence:
    total = total + start
return total

I could just as reasonably expect these semantics:

total = start
for item in sequence:
total += start
return total

Python does not contradict my expectations here:
>>> start = []
>>> x = sum([], start)
>>> x.append(1)
>>> start
[1]
You don't have to care what the items in sequence are, you don't have to
make assumptions about what methods sequence and start have (beyond
supporting iteration and addition).

The only additional assumption I'm making is that Python can take
advantage of in-place addition, which is easy to introspect.
Adding special cases to sum means it
becomes more complex and harder to reason about. If you pass some other
sequence type in the middle of a bunch of lists, what will happen? Will
sum suddenly break, or perhaps continue to work but inefficiently?

This is mostly a red herring, as I would tend to use sum() on
sequences of homogenous types.

Python already gives me the power to shoot myself in the foot for
strings.
>>> list = [1, 2]
>>> list += "foo"
>>> list
[1, 2, 'f', 'o', 'o']
>>> lst = [1,2]
>>> lst.extend('foo')
>>> lst
[1, 2, 'f', 'o', 'o']

I'd prefer to get an exception for cases where += would do the same.
start = []
bogus_example = [[1, 2], None, [3]]
for item in bogus_example: start += item
....
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'NoneType' object is not iterable


You still need to ask these questions with existing sum, but it is
comparatively easy to answer them: you only need to consider how the
alternative behaves when added to a list. You don't have to think about
the technicalities of the sum algorithm itself -- sometimes it calls +,
sometimes extend, sometimes +=, sometimes something else

I would expect sum() to support the same contract as +=, which already
works for numerics (so no backward incompatibility), and which already
works for lists. For custom-designed classes, I would rely on the
promise that augmented assignment falls back to normal methods.
... which of the
various different optimized branches will I fall into this time? Who
knows? sum already has two branches. In my opinion, three branches is one
too many.

As long as it falls into the branch that works, I'm happy. :)
"Aggregating" lists? Not summing them? I think you've just undercut your
argument that sum is the "obvious" way of concatenating lists.

In natural language, we don't talk about "summing" lists, we talk about
joining, concatenating or aggregating them. You have just done it
yourself, and made my point for me.

Nor do you use "chain" or "extend."
And this very thread started because
somebody wanted to know what the equivalent to sum for sequences.

If sum was the obvious way to concatenate sequences, this thread wouldn't
even exist.

This thread is entitled "sum for sequences." I think you just made my
point.
 
S

Steven D'Aprano

It's about a lack of surprises. Which, 99% of the time, Python excels
at. This is why many of us program in Python. This is why some of us
who would never use sum() on lists, EVEN IF IT WERE FIXED TO NOT BE SO
OBNOXIOUSLY SLOW, advocate that it, in fact, be fixed to not be so
obnoxiously slow.

As I said, patches are welcome. Personally, I expect that it would be
rejected, but that's not my decision to make, and who knows, perhaps I'm
wrong and you'll have some of the Python-Dev people support your idea.

sum is not designed to work with lists. It happens to work because lists
happen to use + for concatenation, and because it is too much trouble for
too little benefit to explicitly exclude lists in the same way sum
explicitly excludes strings. In the Python philosophy, simplicity of
implementation is a virtue: the code that is not there contributes
exactly no bugs and has precisely no overhead.

sum has existed as a Python built-in for many years -- by memory, since
Python 2.2, which was nearly nine years ago. Unlike the serious gotcha of
repeated string concatenation:


# DO NOT DO THIS
result = ""
for s in items:
result += s


which *does* cause real problems in real code, I don't believe that there
have been any significant problems caused by summing lists of lists. As
problems go, it is such a minor one that it isn't worth this discussion,
let alone fixing it. But if anyone disagrees, this is open source, go
ahead and fix it. You don't need my permission.
 
S

Steven D'Aprano

On Mar 29, 4:19 pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote: [...]
Python is under no compulsion to make "the obvious way" obvious to
anyone except Guido. It's a bonus if it happens to be obvious to
newbies, not a requirement.

And besides, what is "it" you're talking about?

* Adding integers, decimals or fractions, or floats with a low
  requirement for precision and accuracy? Use sum.

* Adding floats with a high requirement for precision and accuracy?
  Use math.fsum.

* Concatenating strings? Use ''.join.

* Joining lists? Use [].extend.

* Iterating over an arbitrary sequence of arbitrary sequences?
  Use itertools.chain.

That's five different "its", and five obvious ways to do them.
Let's go through them...

"Obvious" doesn't mean you don't have to learn the tools you use. It
doesn't mean that there's no need to think about the requirements of your
problem. It doesn't even mean that the way to do it has to be a built-in
or pre-built solution in the standard library, or that somebody with no
Python experience could intuit the correct function to use based on
nothing more than a good grasp of English.

It certainly doesn't mean that users shouldn't be expected to know how to
import a module:

fsum([1.234534665989, 2.987, 3])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'fsum' is not defined

I called it math.fsum every time I referred to it. Did I need to specify
that you have to import the math module first?

* Concatenating strings? Use ''.join.


Common pitfall:
['abc', 'def', 'ghi'].join()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'

Is it really common?

I've been hanging around this newsgroup for many years now, and I don't
believe I've ever seen anyone confused by this. I've seen plenty of
newbies use repeated string concatenation, but never anyone trying to do
a string join and getting it wrong. If you have any links to threads
showing such confusion, I'd be grateful to see them.

* Joining lists? Use [].extend.

Obvious, yes. Convenient? Not really.
start = []
for list in [[1, 2], [3, 4]]:
... start.extend(list)
...[1, 2, 3, 4]


Why isn't that convenient? It is an obvious algorithm written in three
short lines. If you need a one-liner, write a function and call it:

concatenate_lists(sequence_of_lists)


* Iterating over an arbitrary sequence of arbitrary sequences?
Use itertools.chain.
group1 = ['al', 'bob']
group2 = ['cal']
groups = [group1, group2]

Obvious if you are Dutch...

Or are familiar with the itertools module and the Pythonic practice of
iterating over lazy sequences. Iterators and itertools are fundamental to
the Pythonic way of doing things.


Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'itertools' is not defined

That's the second time you've either mistakenly neglected to import a
module, or deliberately not imported it to make the rhetorical point that
you have to import a module before using it. Yes, you *do* have to import
modules before using them. What's your point? Not everything has to be a
built-in.



[...]
Sum is builtin, but you have to import fsum from math and chain from
itertools.

Join is actually a method on strings, not sequences.

Is that supposed to be an argument against them?



[...]
Just commit all that to memory, and enjoy the productivity of using a
high level language! ;)

If you don't know your tools, you will spend your life hammering screws
in with the butt of your saw. It will work, for some definition of work.
Giving saws heavier, stronger handles to make it faster to hammer screws
is not what I consider good design.
 
P

Paul Rubin

Steven D'Aprano said:
"Obvious" doesn't mean you don't have to learn the tools you use....

Geez you guys, get a room ;-). You're all good programmers with too
much experience for this arguing over stuff this silly.
 
M

Mel

Patrick said:
Because sum() is the obvious way to sum floats; now the existence of
math.fsum() means there are TWO obvious ways to sum floats. Is that
really that hard to understand? How can you misconstrue this so badly
that you write something that can be (easily) interpreted to mean that
you think that I think that once math.fsum() exists, sum() doesn't
even exist any more????

floats are nasty -- as evidence the recent thread on comparing floats for
equality. People use floats when they have to. fsum exists because of
this:

mwilson@tecumseth:~$ python
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41)
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.156.0


You could generalize sum, but after that, there's a case that even fsum
can't handle:

Traceback (most recent call last):


Mel.
 
P

Patrick Maupin

floats are nasty -- as evidence the recent thread on comparing floats for
equality.  People use floats when they have to.  fsum exists because of
this:

....

I understand there are technical reasons for why math.fsum() exists.
I still think that whatever math.fsum() does should probably be a part
of sum().

Regards,
Pat
 
A

Albert van der Horst

On Mar 29, 6:19=A0pm, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
How does the existence of math.fsum contradict the existence of sum?
You're exceptionally good at (probably deliberately) mis-interpreting
what people write.

I cannot read your mind, I can only interpret the words you choose to
write. You said

See, I think the very existence of math.fsum() already violates "there
should be one obvious way to do it."
[end quote]

If sum satisfies the existence of one obvious way, how does math.fsum
violate it? sum exists, and is obvious, regardless of whatever other
solutions exist as well.

Because sum() is the obvious way to sum floats; now the existence of
math.fsum() means there are TWO obvious ways to sum floats. Is that
really that hard to understand? How can you misconstrue this so badly
that you write something that can be (easily) interpreted to mean that
you think that I think that once math.fsum() exists, sum() doesn't
even exist any more????

To a mathematician sum(set) suggest that the order of summation
doesn't matter. (So I wouldn't use sum for concatenating lists.)
Harshly, sum() should be used only for operator + both associative and
commutative.

Now for floating point numbers the order of summation is crucial,
not commutative (a+b)+c <> a+(b+c).
So the obvious thing for someone versed in numerical computing
do is looking whether sum() gives any guarantees for order and
whether there may be a special sum() for floating point.
(This is not very realistic, because such a person would have
skimmed the math library a long time ago, but anyway.)

Met vriendelijke groeten,
Albert van der Horst
 
P

Patrick Maupin

To a mathematician sum(set) suggest that the order of summation
doesn't matter. (So I wouldn't use sum for concatenating lists.)
Harshly, sum() should be used only for operator + both associative and
commutative.

That's all well and good, but not every Python user is a
mathematician. As long as Python doesn't surprise mathematicians in a
way that is too negative (I can see the complaint now: "Hey! sum()
kept my lists ordered! I was expecting some randomization!") what is
wrong with it also not surprising the average user in a way that is
too negative?

Regards,
Pat
 
N

Neil Cerutti

To a mathematician sum(set) suggest that the order of summation
doesn't matter. (So I wouldn't use sum for concatenating
lists.) Harshly, sum() should be used only for operator + both
associative and commutative.

Now for floating point numbers the order of summation is
crucial, not commutative (a+b)+c <> a+(b+c). So the obvious
thing for someone versed in numerical computing do is looking
whether sum() gives any guarantees for order and whether there
may be a special sum() for floating point. (This is not very
realistic, because such a person would have skimmed the math
library a long time ago, but anyway.)

I'm convinced by this argument. I just have to be a mathematician
and a computer scientist skilled in numerical computing. No
problem! Just a *few more years* of education and I'll be ready
for summing things in Python. ;)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,491
Latest member
mohitk

Latest Threads

Top