list.clear() missing?!?

S

Serge Orlov

Martin said:
Felipe said:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?

In the first benchmark, you need space for two lists: the old one and
the new one; the other benchmarks you need only a single block of
memory (*). Concluding from here gets difficult - you would have to study
the malloc implementation to find out whether it works better in one
case over the other. Could also be an issue of processor cache: one
may fit into the cache, but the other may not.

Addition to the previous message. Now I follow you :) There are indeed
two arrays and cache seems to be the second reason for slowdown, but
iterating backwards is also contributing to slowdown.
 
S

Serge Orlov

Felipe said:
Em Qua, 2006-04-12 às 11:36 +1000, Steven D'Aprano escreveu:
Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?

In the first benchmark, you need space for two lists: the old one and
the new one;

Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?

He's talking about the garbage collector.

To be exact the reason for two array is timeit.py. It doesn't place the
code to time into a separate namespace but injects it into a for loop,
so the actual code timed is:
for _i in _it:
x = range(100000)
and that makes two arrays with 100.000 items exist for a short time
starting from second iteration.
 
S

Steven D'Aprano

Em Qua, 2006-04-12 às 11:36 +1000, Steven D'Aprano escreveu:
On Tue, 11 Apr 2006 19:15:18 +0200, Martin v. Löwis wrote:

Felipe Almeida Lessa wrote:
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?

In the first benchmark, you need space for two lists: the old one and
the new one;

Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?

He's talking about the garbage collector.

To be exact the reason for two array is timeit.py. It doesn't place the
code to time into a separate namespace but injects it into a for loop,
so the actual code timed is:
for _i in _it:
x = range(100000)
and that makes two arrays with 100.000 items exist for a short time
starting from second iteration.

But that is precisely the same for the other timeit tests too.

for _i in _it:
x = range(100000)
del x[:]

etc.

The question remains -- why does it take longer to do X than it takes to
do X and then Y?
 
D

Duncan Booth

Steven said:
But that is precisely the same for the other timeit tests too.

for _i in _it:
x = range(100000)
Allocate list.
Allocate ob_item array to hold pointers to 10000 objects
Allocate 99900 integer objects
setup list
Calls list_clear which:
decrements references to 99900 integer objects, freeing them
frees the ob_item array

.... next time round ...
x = range(100000)

Allocate new list list.
Allocate ob_item array probably picks up same memory block as last time
Allocate 99900 integer objects, probably reusing same memory as last time
setup list
etc.

The question remains -- why does it take longer to do X than it takes to
do X and then Y?
.... x = range(100000);

Allocate list.
Allocate ob_item array to hold pointers to 10000 objects
Allocate 99900 integer objects
setup list

.... next time round ...
Allocate another list
allocate a second ob_item array
allocate another 99900 integer objects
setup the list
then deletes the original list, decrements and releases original integers,
frees original ob_item array.

Uses twice as much everything except the actual list object. The actual
work done is the same but I guess there are likely to be more cache misses.
Also there is the question whether the memory allocation does or does not
manage to reuse the recently freed blocks. With one large block I expect it
might well end up reusing it, with two large blocks being freed alternately
it might not manage to reuse either (but that is just a guess and maybe
system dependant).
 
J

John Salerno

Steven said:
But name.clear() meaning "mutate the object referenced by name to the
empty state" is a very natural candidate for a method, and I don't
understand why lists shouldn't have it.

Funny this even comes up, because I was just trying to 'clear' a list
the other day. But it sounds like it's been an issue for a while. :)
 
S

Steven Bethard

Steven said:
John said:
Thanks guys, your explanations are really helpful. I think what had me
confused at first was my understanding of what L[:] does on either side
of the assignment operator. On the left, it just chooses those elements
and edits them in place; on the right, it makes a copy of that list,
right? (Which I guess is still more or less *doing* the same thing, just
for different purposes)
Interestingly, if it was just a "clear" method nobody would be confused.

Even more importantly, you could say help(list.clear) and learn something
useful, instead of trying help(del) and getting a syntax error.
[snip more reasons to add list.clear()]

I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?

STeVe
 
J

John Salerno

Steven said:
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?

What are the usual arguments against adding it?
 
G

Georg Brandl

John said:
What are the usual arguments against adding it?

That there should be one obvious way to do it.

Yes, I know that it can be debated whether "del x[:]" is obvious, and
fortunately I'm not the one to decide <wink>.

Georg
 
P

Peter Hansen

Georg said:
John said:
What are the usual arguments against adding it?


That there should be one obvious way to do it.

Yes, I know that it can be debated whether "del x[:]" is obvious, and
fortunately I'm not the one to decide <wink>.

It's so "obvious", I couldn't even find it in the tutorial the other day
when this thread came up. It's a wonder I ever learned that one myself,
and I don't know where I saw it. .clear(), on the other hand, would
clearly be easy to add/find in the tutorial... and we'd all save time
answering the question over and over around here (valuable time we would
then be able to using telling people who hadn't, to go and read the
tutorial! :) ).


-Peter
 
P

Peter Hansen

John said:
What are the usual arguments against adding it?

I think one is that it's very rare to need it. Most of the time, just
rebinding the name to a new empty list is easier, and possibly faster.
The main problem with that approach is that in multithreaded code that
can be a source of subtle bugs. On the other hand, most things can be a
source of subtle bugs in multithreaded code, so maybe that's not a good
reason to add clear().

Saving us time answering the question repeatedly (and trying to justify
the current state of affairs) might be justification enough...

-Peter
 
S

Steven Bethard

John said:
What are the usual arguments against adding it?

I think most of the other folks have already given you the answers, but
the main ones I remember:

(1) There are already at least two ways of spelling it. We don't need
another.

(2) Clearing a list is a pretty uncommon operation. Usually just
creating a new list is fine (and easier).

Disclaimer: I'm happy with adding list.clear(), so it's quite possible
I'm misrepresenting the argument here. I guess that's just all the more
reason to have a PEP for it: to lay out all the pros and cons.

STeVe
 
G

Guest

Steven said:
$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?
In the first benchmark, you need space for two lists: the old one and
the new one;

Er, what new list? I see only one list, x = range(100000), which is merely
created then nothing done to it. Have I missed something?

See Duncan's explanation. This code is run many times, allocating many
lists. However, only two different lists exist at any point at time.

A Python list consists of two memory blocks: the list proper (a few
bytes), plus the "guts", i.e. a variable-sized block of pointers to
the objects. del x[:] frees the guts.
I understood Felipe to be asking, why does it take longer to just create a
list, than it takes to create a list AND then do something to it?

Actually, the same code (deallocation of all integers and the list
blocks) appears in either case. However, in the one case it is triggered
explicitly, and before the new allocation; in the other case, the
allocation of the new objects occurs before the old ones are released.

Regards,
Martin
 
M

Mel Wilson

Ville said:
Fredrik said:
because Python already has a perfectly valid way to clear a list,
perhaps ?

del l[:]


Ok. That's pretty non-obvious but now that I've seen it I'll probably
remember it. I did a stupid "while l: l.pop()" loop myself.

Actually, it's in the Library Reference (that we keep under
our pillows) section 2.3.6.4 Mutable Sequence Types.

Cheers, Mel.
 
R

Raymond Hettinger

[Steven Bethard]
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?

Something this small doesn't need a PEP. I'll just send a note to
Guido asking for a pronouncement.

Here's a draft list of pros and cons (any changes or suggestions are
welcome):

Pros:
-----

* s.clear() is more obvious in intent

* easier to figure-out, look-up, and remember than either s[:]=[] or
del s[:]

* parallels the api for dicts, sets, and deques (increasing the
expecation that lists will too)

* the existing alternatives are a bit perlish

* the OP is shocked, SHOCKED that python got by for 16 years without
list.clear()


Cons:
-----

* makes the api fatter (there are already two ways to do it)

* expanding the api makes it more difficult to write classes that can
be polymorphically substituted for lists

* learning slices is basic to the language (this lesson shouldn't be
skipped)

* while there are valid use cases for re-using lists, the technique is
already overused for unsuccessful attempts to micro-optimize (creating
new lists is surprisingly fast)

* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)
 
F

Felipe Almeida Lessa

Em Qua, 2006-04-12 às 12:40 -0700, Raymond Hettinger escreveu:
* the existing alternatives are a bit perlish

I love this argument =D! "perlish"... lol...

Cheers,
 
R

Raymond Hettinger

[Felipe Almeida Lessa]
I love benchmarks, so as I was testing the options, I saw something very
strange:

$ python2.4 -mtimeit 'x = range(100000); '
100 loops, best of 3: 6.7 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x[:]'
100 loops, best of 3: 6.35 msec per loop
$ python2.4 -mtimeit 'x = range(100000); x[:] = []'
100 loops, best of 3: 6.36 msec per loop
$ python2.4 -mtimeit 'x = range(100000); del x'
100 loops, best of 3: 6.46 msec per loop

Why the first benchmark is the slowest? I don't get it... could someone
test this, too?

[Dan Christensen]
I get similar behaviour. No idea why.

It is an effect of the memory allocator and fragmentation. The first
builds up a list with increasingly larger sizes. It periodically
cannot grow in-place because something is in the way (some other
object) so it needs to move its current entries to another, larger
block and grow from there. In contrast, the other entries are reusing
a the previously cleared out large block.

Just for grins, replace the first with"
'x=None; x=range(100000)'
The assignment to None frees the reference to the previous list and
allows it to be cleared so that its space is immediately available to
the new list being formed by range().
 
P

Peter Hansen

Raymond said:

And yet it doesn't appear to be in the tutorial. I could have missed
it, but I've looked in a number of the obvious places, without actually
going through it (again) from start to finish. Also, googling for
"slice site:docs.python.org", you have to go to the *sixth* entry before
you can find the first mention of "del x[:]" and what it does. I think
given the current docs it's possible to learn all kinds of things about
slicing and still not make the non-intuitive leap that "del x[slice]" is
actually how you spell "delete contents of list in-place".
* while there are valid use cases for re-using lists, the technique is
already overused for unsuccessful attempts to micro-optimize (creating
new lists is surprisingly fast)

Not just valid use-cases, but ones for which "x = []" is entirely buggy,
yet not obviously so, especially to newcomers.
* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)

The first phrase is insulting and untrue, the second merely untrue (as
even your own list of pros and cons shows), and the last completely
valid and highly relevant, yet not overriding.

-Peter
 
V

Ville Vainio

Raymond said:
* easier to figure-out, look-up, and remember than either s[:]=[] or
del s[:]

Easier is an understatement - it's something you figure out
automatically. When I want to do something w/ an object, looking at its
methods (via code completion) is the very first thing.
* the OP is shocked, SHOCKED that python got by for 16 years without
list.clear()

I'm sure you realize I was being sarcastic...
* learning slices is basic to the language (this lesson shouldn't be
skipped)

Assigning to slices is much less important, and is something I always
never do (and hence forget).
* the request is inane, the underlying problem is trivial, and the
relevant idiom is fundamental (api expansions should be saved for rich
new functionality and not become cluttered with infrequently used
redundant entries)

I understand that these are the main arguments. However, as it stands
there is no one *obvious* way to clear a list in-place. I agree that
it's rare to even need it, but when you do a it's a little bit of a
surprise.
 
A

Alan Morgan

[Steven Bethard]
I think these are all good reasons for adding a clear method, but being
that it has been so hotly contended in the past, I don't think it will
get added without a PEP. Anyone out there willing to take out the best
examples from this thread and turn it into a PEP?

Something this small doesn't need a PEP. I'll just send a note to
Guido asking for a pronouncement.

Here's a draft list of pros and cons (any changes or suggestions are
welcome):

Pros:

Serious question: Should it work more like "s=[]" or more like
"s[:]=[]". I'm assuming the latter, but the fact that there is
a difference is an argument for not hiding this operation behind
some syntactic sugar.

Alan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,293
Messages
2,571,501
Members
48,189
Latest member
StaciLgf76

Latest Threads

Top