preferring [] or () in list of error codes?

M

mh

Is there any reason to prefer one or the other of these statements?

if e.message.code in [25401,25402,25408]:
if e.message.code in (25401,25402,25408):

I'm currently using [], but only coz I think it's prettier
than ().

context: these are database errors and e is database exception,
so there's probably been zillions of instructions and io's
handling that already.

Many TIA!
Mark
 
J

John Machin

Is there any reason to prefer one or the other of these statements?

if e.message.code in [25401,25402,25408]:
if e.message.code in (25401,25402,25408):
From the viewpoint of relative execution speed, in the above case
if it matters at all it matters only on Python 2.4 AFAICT:

| >>> L=lambda x:x in[25401,25402,25408];
T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T)
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (25401)
6 LOAD_CONST 2 (25402)
9 LOAD_CONST 3 (25408)
12 BUILD_LIST 3
15 COMPARE_OP 6 (in)
18 RETURN_VALUE
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 4 ((25401, 25402, 25408))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE

Earlier versions build the list or tuple at run time
(as for the list above); later versions detect that
the list can't be mutated and generate the same code
for both the list and tuple.

However there are limits to the analysis that can be
performed e.g. if the list is passed to a function,
pursuit halts at the county line:

[Python 2.6.2]
| >>> F=lambda y,z:y in z;L=lambda x:F(x,[25401,25402,25408]);
T=lambda x:F(x,(25401,25402,25408));import dis;dis.dis(L);dis.dis(T)
1 0 LOAD_GLOBAL 0 (F)
3 LOAD_FAST 0 (x)
6 LOAD_CONST 0 (25401)
9 LOAD_CONST 1 (25402)
12 LOAD_CONST 2 (25408)
15 BUILD_LIST 3
18 CALL_FUNCTION 2
21 RETURN_VALUE
1 0 LOAD_GLOBAL 0 (F)
3 LOAD_FAST 0 (x)
6 LOAD_CONST 3 ((25401, 25402, 25408))
9 CALL_FUNCTION 2
12 RETURN_VALUE

So in general anywhere I had a "list constant" I'd make
it a tuple -- I'm not aware of any way that performance
gets worse by doing that, and it can get better.

Background: I'm supporting packages that run on 2.1 to 2.6
in one case and 2.4 to 2.6 in the other; every little
unobtrusive tweak helps :)

HTH,
John
 
C

Carl Banks

Is there any reason to prefer one or the other of these statements?
        if e.message.code in [25401,25402,25408]:
        if e.message.code in (25401,25402,25408):
I'm currently using [], but only coz I think it's prettier
than ().

Use a list when the semantic meaning of an item doesn't depend on all
the other items: it's “only” a collection of values.

Your list of message codes is a good example: if a value appears at
index 3, that doesn't make it mean something different from the same
value appearing at index 2.

Use a tuple when the semantic meaning of the items are bound together,
and it makes more sense to speak of all the items as a single structured
value.

If you want to go strictly by the book, I would say he ought to be
using a set since his collection of numbers has no meaningful order
nor does it make sense to list any item twice.

I don't think it's very important, however, to stick to rules like
that for objects that don't live for more than a single line of code.


Carl Banks
 
C

Carl Banks

Yes, a set would be best for this specific situation.


It's important to the extent that it's important to express one's
*meaning*. Program code should be written primarily as a means of
communicating with other programmers, and only incidentally for the
computer to execute.

Which is precisely why isn't not very important for an object that
exists for one line. No programmer is ever going to be confused about
the meaning of this:

if a in (1,2,3):


Carl Banks
 
C

Charles Yeomans

Which is precisely why isn't not very important for an object that
exists for one line. No programmer is ever going to be confused about
the meaning of this:

if a in (1,2,3):

Actually, I might be -- I think of a tuple first as a single thing, as
opposed to a list or map, which I see first as a collection of other
things.

Charles Yeomans
 
S

samwyse

Is there any reason to prefer one or the other of these statements?
        if e.message.code in [25401,25402,25408]:
        if e.message.code in (25401,25402,25408):

If you want to go strictly by the book, I would say he ought to be
using a set since his collection of numbers has no meaningful order
nor does it make sense to list any item twice.

As the length of the list increases, the increased speeds of looking
something up makes using a set makes more sense. But what's the best
way to express this? Here are a few more comparisons (using Python
3.0)...
1 0 LOAD_FAST 0 (x)
3 LOAD_GLOBAL 0 (set)
6 LOAD_CONST 3 ((25401, 25402, 25408))
9 CALL_FUNCTION 1
12 COMPARE_OP 6 (in)
15 RETURN_VALUE 1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 0 (25401)
6 LOAD_CONST 1 (25402)
9 LOAD_CONST 2 (25408)
12 BUILD_SET 3
15 COMPARE_OP 6 (in)
18 RETURN_VALUE 1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 3 ((25401, 25402, 25408))
6 BUILD_SET 1
9 COMPARE_OP 6 (in)
12 RETURN_VALUE

I conclude that using constructors is generally a bad idea, since the
compiler doesn't know if you're calling the builtin or something with
an overloaded name. I presume that the compiler will eventually
optimize the second example to match the last, but both of them use
the BUILD_SET opcode. I expect that this can be expensive for long
lists, so I don't think that it's a good idea to use set constants
inside loops. Instead it should be assigned to a global or class
variable.
 
C

Chris Rebert

(e-mail address removed) writes:
Is there any reason to prefer one or the other of these statements?
        if e.message.code in [25401,25402,25408]:
        if e.message.code in (25401,25402,25408):

If you want to go strictly by the book, I would say he ought to be
using a set since his collection of numbers has no meaningful order
nor does it make sense to list any item twice.

As the length of the list increases, the increased speeds of looking
something up makes using a set makes more sense.  But what's the best
way to express this?  Here are a few more comparisons (using Python
3.0)...
 1           0 LOAD_FAST                0 (x)
             3 LOAD_GLOBAL              0 (set)
             6 LOAD_CONST               3 ((25401, 25402, 25408))
             9 CALL_FUNCTION            1
            12 COMPARE_OP               6 (in)
            15 RETURN_VALUE 1           0 LOAD_FAST                0 (x)
             3 LOAD_CONST               0 (25401)
             6 LOAD_CONST               1 (25402)
             9 LOAD_CONST               2 (25408)
            12 BUILD_SET                3
            15 COMPARE_OP               6 (in)
            18 RETURN_VALUE 1           0 LOAD_FAST                0 (x)
             3 LOAD_CONST               3 ((25401, 25402, 25408))
             6 BUILD_SET                1
             9 COMPARE_OP               6 (in)
            12 RETURN_VALUE

I conclude that using constructors is generally a bad idea, since the
compiler doesn't know if you're calling the builtin or something with
an overloaded name.  I presume that the compiler will eventually
optimize the second example to match the last, but both of them use
the BUILD_SET opcode.  I expect that this can be expensive for long

Erm, unless I misunderstand you somehow, the second example will and
should *never* match the last.
The set {25401,25402,25408}, containing 3 integer elements, is quite
distinct from the set {(25401,25402,25408)}, containing one element
and that element is a tuple.
set(X) != {X}; set([X]) = {X}

Cheers,
Chris
 
S

Steven D'Aprano

Yes, a set would be best for this specific situation.


It's important to the extent that it's important to express one's
*meaning*. Program code should be written primarily as a means of
communicating with other programmers, and only incidentally for the
computer to execute.


But practicality beats purity -- there are many scenarios where we make
compromises in our meaning in order to get correct, efficient code. E.g.
we use floats, despite them being a poor substitute for the abstract Real
numbers we mean.

In addition, using a tuple or a list in this context:

if e.message.code in (25401,25402,25408):

is so idiomatic, that using a set in it's place would be distracting.
Rather that efficiently communicating the programmer's intention, it
would raise in my mind the question "that's strange, why are they using a
set there instead of a tuple?".
 
E

Emile van Sebille

In addition, using a tuple or a list in this context:

if e.message.code in (25401,25402,25408):

is so idiomatic, that using a set in it's place would be distracting.

I think a list in that context is fine, and that's the idiom I see far
more often than a tuple.
Rather that efficiently communicating the programmer's intention, it
would raise in my mind the question "that's strange, why are they
using a set there instead of a tuple?".

The fact that literal set syntax is a relative newcomer is the primary
reason for that, I'd wager.
[/QUOTE]

Well, no. It really is more, "that's odd... why use set?"

Emile
 
S

Steven D'Aprano

Use a list when the semantic meaning of an item doesn't depend on all
the other items: it's “only†a collection of values.

Your list of message codes is a good example: if a value appears at
index 3, that doesn't make it mean something different from the same
value appearing at index 2.

That advice would seem to imply that lists shouldn't be ordered. If a
list of values has an order, it implies that "first place" (index 0) is
different from "second place", by virtue of the positions they appear in
the list. The lists:

presidential_candidates_sorted_by_votes = ['Obama', 'McCain']
presidential_candidates_sorted_by_votes = ['McCain', 'Obama']

have very different meanings. Prohibiting the use of lists in the context
of ordered data is surely is an unfortunate consequence of your advice.

James Tauber explains this at
<URL:http://jtauber.com/blog/2006/04/15/
python_tuples_are_not_just_constant_lists/>.


He doesn't really explain anything though, he merely states it as
revealed wisdom. The closest he comes to an explanation is to declare
that in tuples "the index in a tuple has an implied semantic. The point
of a tuple is that the i-th slot means something specific. In other
words, it's a index-based (rather than name based) datastructure." But he
gives no reason for why we should accept that as true for tuples but not
lists.

It may be that that's precisely the motivation Guido had when he
introduced tuples into Python, but why should we not overload tuples with
more meanings than Guido (hypothetically) imagined? In other words, why
*shouldn't* we treat tuples as immutable lists, if that helps us solve a
problem effectively?

To put it another way, I think the question of whether or not tuples are
immutable lists has the answer Mu. Sometimes they are, sometimes they're
not. I have no problem with the title of the quoted blog post -- that
tuples are not *just* constant lists -- but I do dispute that there is
any reason for declaring that tuples must not be used as constant lists.
As tuples are defined in Python, they quack like immutable lists, they
walk like immutable lists, and they swim like immutable lists. Why
shouldn't we treat them as immutable lists?

Phillip Eby states that "Lists are intended to be homogeneous sequences,
while tuples are heterogeneous data structures." (Notice the subtle shift
there: lists are "intended", while tuples "are". But in fact, there's
nothing to stop you from putting homogeneous data into a tuple, so Eby is
wrong to say that tuples *are* heterogeneous.)

Perhaps Eby intends lists to be homogeneous, perhaps Guido does too, but
this is Python, where we vigorously defend the right to shoot ourselves
in the foot. We strongly discourage class creators from trying to enforce
their intentions by using private attributes, and even when we allow such
a thing, the nature of Python is that nothing is truly private. Why
should homogeneity and heterogeneity of lists and tuples be sacrosanct?
Nothing stops me from putting hetereogeneous data into a list, or
homogeneous data into a tuple, and there doesn't appear to be any ill-
effects from doing so. Why give lose sleep over the alleged lack of
purity?
 
S

samwyse

(e-mail address removed) writes:
Is there any reason to prefer one or the other of these statements?
        if e.message.code in [25401,25402,25408]:
        if e.message.code in (25401,25402,25408):
If you want to go strictly by the book, I would say he ought to be
using a set since his collection of numbers has no meaningful order
nor does it make sense to list any item twice.
As the length of the list increases, the increased speeds of looking
something up makes using a set makes more sense.  But what's the best
way to express this?  Here are a few more comparisons (using Python
3.0)...
 1           0 LOAD_FAST                0 (x)
             3 LOAD_GLOBAL              0 (set)
             6 LOAD_CONST               3 ((25401, 25402, 25408))
             9 CALL_FUNCTION            1
            12 COMPARE_OP               6 (in)
            15 RETURN_VALUE
 1           0 LOAD_FAST                0 (x)
             3 LOAD_CONST               0 (25401)
             6 LOAD_CONST               1 (25402)
             9 LOAD_CONST               2 (25408)
            12 BUILD_SET                3
            15 COMPARE_OP               6 (in)
            18 RETURN_VALUE
 1           0 LOAD_FAST                0 (x)
             3 LOAD_CONST               3 ((25401, 25402, 25408))
             6 BUILD_SET                1
             9 COMPARE_OP               6 (in)
            12 RETURN_VALUE
I conclude that using constructors is generally a bad idea, since the
compiler doesn't know if you're calling the builtin or something with
an overloaded name.  I presume that the compiler will eventually
optimize the second example to match the last, but both of them use
the BUILD_SET opcode.  I expect that this can be expensive for long

Erm, unless I misunderstand you somehow, the second example will and
should *never* match the last.
The set {25401,25402,25408}, containing 3 integer elements, is quite
distinct from the set {(25401,25402,25408)}, containing one element
and that element is a tuple.
set(X) != {X}; set([X]) = {X}

D'oh! I was thinking about how you can initialize a set from an
iterator and for some reason thought that you could do the same with a
set constant.
 
S

samwyse

I conclude that using constructors is generally a bad idea, since the
compiler doesn't know if you're calling the builtin or something with
an overloaded name.  I presume that the compiler will eventually
optimize the second example to match the last, but both of them use
the BUILD_SET opcode.  I expect that this can be expensive for long
lists, so I don't think that it's a good idea to use set constants
inside loops.  Instead it should be assigned to a global or class
variable.

Time to test things! I'm going to compare three things using Python
3.0:
X={...}\nS=lambda x: x in X
S=lambda x: x in {...}
S=lambda x: x in (...)
where the ... is replaced by lists of integers of various lengths.
Here's the test bed:

from random import seed, sample
from timeit import Timer
maxint = 2**31-1
values = list(map(lambda n: 2**n-1, range(1,16)))
def append_numbers(k, setup):
seed(1968740928)
for i in sample(range(maxint), k):
setup.append(str(i))
setup.append(',')
print('==', 'separate set constant')
for n in values[::2]:
print('===', n, 'values')
setup = ['X={']
append_numbers(n, setup)
setup.append('}\nS=lambda x: x in X')
t = Timer('S(88632719)', ''.join(setup))
print(t.repeat())
print('==', 'in-line set constant')
for n in values[:4]:
print('===', n, 'values')
setup = ['S=lambda x: x in {']
append_numbers(n, setup)
setup.append('}')
t = Timer('S(88632719)', ''.join(setup))
print(t.repeat())
print('==', 'in-line list constant')
for n in values:
print('===', n, 'values')
setup = ['S=lambda x: x in (']
append_numbers(n, setup)
setup.append(')')
t = Timer('S(88632719)', ''.join(setup))
print(t.repeat())

And here are the results. There's something interesting at the very
end.

== separate set constant
=== 1 values
[0.26937306277753176, 0.26113626173158877, 0.26000092190487889]
=== 7 values
[0.26583266867716426, 0.27223543774418268, 0.27681646689732919]
=== 31 values
[0.25089725090758752, 0.25562690230182894, 0.25844625504079444]
=== 127 values
[0.32404313956103392, 0.33048948958596691, 0.34487930728626104]
=== 511 values
[0.27574566041214732, 0.26991838348169983, 0.28309016928129083]
=== 2047 values
[0.27826162263639631, 0.27337357122204065, 0.26888752620793976]
=== 8191 values
[0.27479134917985437, 0.27955955295994261, 0.27740676538498654]
=== 32767 values
[0.26189725230441319, 0.25949247739587022, 0.2537356004743625]
== in-line set constant
=== 1 values
[0.43579086168772818, 0.4231755711968983, 0.42178740594125852]
=== 3 values
[0.54712875519095228, 0.55325048295244272, 0.54346991028189251]
=== 7 values
[1.1897654590178366, 1.1763383335032813, 1.2009900699669931]
=== 15 values
[1.7661906750718313, 1.7585005915556291, 1.7405896559478933]
== in-line list constant
=== 1 values
[0.23651385860493335, 0.24746972031361381, 0.23778469051234197]
=== 3 values
[0.23710750947396875, 0.23205630883254713, 0.23345592805789295]
=== 7 values
[0.24607764394636789, 0.23551903943099006, 0.24241377046524093]
=== 15 values
[0.2279376289444599, 0.22491908887861456, 0.24076747184349045]
=== 31 values
[0.22860084172708994, 0.233022074034551, 0.23138639128715965]
=== 63 values
[0.23671639831319169, 0.23404259479906031, 0.22269394573891077]
=== 127 values
[0.22754176857673158, 0.22818151468971593, 0.22711154629987718]
=== 255 values
[0.23503126794047802, 0.24493699618247788, 0.26690207833677349]
=== 511 values
[0.24518255811842238, 0.23878118587697728, 0.22844830837438934]
=== 1023 values
[0.23285585179122137, 0.24067220833932623, 0.23807439213642922]
=== 2047 values
[0.24206484343680756, 0.24352201187581102, 0.24366253252857462]
=== 4095 values
[0.24624526301527183, 0.23692145230748807, 0.23829956041899081]
=== 8191 values
[0.22246514570986164, 0.22435309515595137, 0.22220114567633331]
=== 16383 values
[194.29462683106374, 193.21789529116128, 193.25843228678508]
=== 32767 values

You will note that testing against a list constant is just as fast as
testing against a set. This was surprising for me; apparently the
__contains__ operator turns a tuple into a set. You will also note
that performance to fall off drastically for the last set of values.
I'm not sure what happens there; I guess I'll file a bug report.
 
S

samwyse

On 6/8/2009 8:43 PM Ben Finney said...

Well, no.  It really is more, "that's odd... why use set?"

Until I ran some timing tests this morning, I'd have said that sets
could determine membership faster than a list, but that's apparently
not true, assuming that the list has less than 8K members. Above 16K
members, sets are much faster than lists. I'm not sure where the
break is, or even why there's a break.
 
C

Carl Banks

I conclude that using constructors is generally a bad idea, since the
compiler doesn't know if you're calling the builtin or something with
an overloaded name.  I presume that the compiler will eventually
optimize the second example to match the last, but both of them use
the BUILD_SET opcode.  I expect that this can be expensive for long
lists, so I don't think that it's a good idea to use set constants
inside loops.  Instead it should be assigned to a global or class
variable.

Time to test things!   I'm going to compare three things using Python
3.0:
  X={...}\nS=lambda x: x in X
  S=lambda x: x in {...}
  S=lambda x: x in (...)
where the ... is replaced by lists of integers of various lengths.
Here's the test bed:

from random import seed, sample
from timeit import Timer
maxint = 2**31-1
values = list(map(lambda n: 2**n-1, range(1,16)))
def append_numbers(k, setup):
    seed(1968740928)
    for i in sample(range(maxint), k):
        setup.append(str(i))
        setup.append(',')
print('==', 'separate set constant')
for n in values[::2]:
    print('===', n, 'values')
    setup = ['X={']
    append_numbers(n, setup)
    setup.append('}\nS=lambda x: x in X')
    t = Timer('S(88632719)', ''.join(setup))
    print(t.repeat())
print('==', 'in-line set constant')
for n in values[:4]:
    print('===', n, 'values')
    setup = ['S=lambda x: x in {']
    append_numbers(n, setup)
    setup.append('}')
    t = Timer('S(88632719)', ''.join(setup))
    print(t.repeat())
print('==', 'in-line list constant')
for n in values:
    print('===', n, 'values')
    setup = ['S=lambda x: x in (']
    append_numbers(n, setup)
    setup.append(')')
    t = Timer('S(88632719)', ''.join(setup))
    print(t.repeat())

It looks like you are evaluating the list/set/tuple every pass, and
then, for lists and tuples, always indexing the first item.

And here are the results.  There's something interesting at the very
end.
[snip results showing virtually identical performance for list, set,
and tuple]
You will note that testing against a list constant is just as fast as
testing against a set.  This was surprising for me; apparently the
__contains__ operator turns a tuple into a set.

Given the way you wrote the test it this is hardly surprising.

I would expect "item in list" to have comparable execution time to
"item in set" if item is always the first element in list.

Furthermore, the Python compiler appears to be optimizing this
specific case to always use a precompiled set. Well, almost
always....

 You will also note
that  performance to fall off drastically for the last set of values.
I'm not sure what happens there; I guess I'll file a bug report.

Please don't; it's not a bug. The slowdown is because at sizes above
a certain threshold the Python compiler doesn't try to precompile in-
line lists, sets, and tuples. The last case was above that limit.


Carl Banks
 
C

Carl Banks

Until I ran some timing tests this morning, I'd have said that sets
could determine membership faster than a list, but that's apparently
not true,

See my reply to that post. I believe your tests were flawed.
assuming that the list has less than 8K members.  Above 16K
members, sets are much faster than lists.  I'm not sure where the
break is, or even why there's a break.

The break comes from the compiler, not the objects themselves.


Carl Banks
 
M

mh

John Machin said:
T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T)

I've learned a lot from this thread, but this is the
niftiest bit I've picked up... thanks!
 
T

Terry Reedy

Steven said:
He doesn't really explain anything though, he merely states it as
revealed wisdom. The closest he comes to an explanation is to declare
that in tuples "the index in a tuple has an implied semantic. The point
of a tuple is that the i-th slot means something specific. In other
words, it's a index-based (rather than name based) datastructure." But he
gives no reason for why we should accept that as true for tuples but not
lists.

It may be that that's precisely the motivation Guido had when he
introduced tuples into Python, but why should we not overload tuples with
more meanings than Guido (hypothetically) imagined? In other words, why
*shouldn't* we treat tuples as immutable lists, if that helps us solve a
problem effectively?

I believe that we should overload tuples with *less* specific meaning
than originally. In 3.0, tuples have *all* the general sequence
operations and methods, including .index() and .count(). This was not
true in 2.5 (don't know about 2.6), which is why tuples are yet not
documented as having those two methods (reported in
http://bugs.python.org/issue4966
). Operationally, they are now general immutable sequences. Period.

Terry Jan Reedy
 
T

Terry Reedy

I've learned a lot from this thread, but this is the
niftiest bit I've picked up... thanks!

If you are doing a lot of dissing, starting with
from dis import dis
saves subsequent typing.

tjr
 
S

Steven D'Aprano

Time to test things! I'm going to compare three things using Python
3.0:
X={...}\nS=lambda x: x in X
S=lambda x: x in {...}
S=lambda x: x in (...)
where the ... is replaced by lists of integers of various lengths.
Here's the test bed:

[snip]

Hmmm... I think your test-bed is unnecessarily complicated, making it
difficult to see what is going on. Here's my version, with lists included
for completeness. Each test prints the best of five trials of one million
repetitions of ten successful searches, then does the same thing again
for unsuccessful searches.


from timeit import Timer

def test(size):
global s, l, t, targets
print("Testing search with size %d" % size)
rng = range(size)
s, l, t = set(rng), list(rng), tuple(rng)
# Calculate a (more or less) evenly distributed set of ten
# targets to search for, including both end points.
targets = [i*size//9 for i in range(9)] + [size-1]
assert len(targets) == 10
setup = "from __main__ import targets, %s"
body = "for i in targets: i in %s"
# Run a series of successful searches.
for name in "s l t".split():
obj = globals()[name]
secs = min(Timer(body % name, setup % name).repeat(repeat=5))
print("Successful search in %s: %f s" % (type(obj), secs))
# Also run unsuccessful tests.
targets = [size+x for x in targets]
for name in "s l t".split():
obj = globals()[name]
secs = min(Timer(body % name, setup % name).repeat(repeat=5))
print("Unsuccessful search in %s: %f s" % (type(obj), secs))

Results are:
Testing search with size 1
Successful search in <class 'set'>: 1.949509 s
Successful search in <class 'list'>: 1.838387 s
Successful search in <class 'tuple'>: 1.876309 s
Unsuccessful search in <class 'set'>: 1.998207 s
Testing search with size 10
Successful search in <class 'set'>: 1.943664 s
Successful search in <class 'list'>: 3.659786 s
Successful search in <class 'tuple'>: 3.569164 s
Unsuccessful search in <class 'set'>: 1.935553 s
Testing search with size 100
Successful search in <class 'set'>: 1.907839 s
Successful search in <class 'list'>: 21.704032 s
Successful search in <class 'tuple'>: 21.391875 s
Unsuccessful search in <class 'set'>: 1.916241 s
Testing search with size 1000
Successful search in <class 'set'>: 2.256150 s
Successful search in <class 'list'>: 189.991579 s
Successful search in <class 'tuple'>: 187.349630 s
Unsuccessful search in <class 'set'>: 1.869202 s
Unsuccessful search in <class 'list'>: 398.451284 s
Unsuccessful search in <class 'tuple'>: 388.544178 s




As expected, lists and tuples are equally as fast (or slow if you
prefer). Successful searches are about twice as fast as unsuccessful
ones, and performance suffers as the size of the list/tuple increases.
However, sets are nearly just as fast no matter the size of the set, or
whether the search is successfully or unsuccessful.


You will note that testing against a list constant is just as fast as
testing against a set. This was surprising for me; apparently the
__contains__ operator turns a tuple into a set.

I doubt that very much.
 
G

Gabriel Genellina

En Tue, 09 Jun 2009 05:02:33 -0300, Steven D'Aprano
[...] As tuples are defined in Python, they quack like immutable lists,
they
walk like immutable lists, and they swim like immutable lists. Why
shouldn't we treat them as immutable lists?

Phillip Eby states that "Lists are intended to be homogeneous sequences,
while tuples are heterogeneous data structures." (Notice the subtle shift
there: lists are "intended", while tuples "are". But in fact, there's
nothing to stop you from putting homogeneous data into a tuple, so Eby is
wrong to say that tuples *are* heterogeneous.)

Perhaps Eby intends lists to be homogeneous, perhaps Guido does too, but
this is Python, where we vigorously defend the right to shoot ourselves
in the foot. We strongly discourage class creators from trying to enforce
their intentions by using private attributes, and even when we allow such
a thing, the nature of Python is that nothing is truly private. Why
should homogeneity and heterogeneity of lists and tuples be sacrosanct?
Nothing stops me from putting hetereogeneous data into a list, or
homogeneous data into a tuple, and there doesn't appear to be any ill-
effects from doing so. Why give lose sleep over the alleged lack of
purity?

Yes - but in the past the distinction was very much stronger. I think that
tuples didn't have *any* method until Python 2.0 -- so, even if someone
could consider a tuple a "read-only list", the illusion disappeared as
soon as she tried to write anything more complex that a. Maybe tuples
could quack like immutable lists, but they could not swim nor walk...

With time, tuples gained more and more methods and are now very similar to
lists - they even have an index() method (undocumented but obvious) which
is absurd in the original context. Think of tuples as used in relational
databases: there is no way in SQL to express the condition "search for
this along all values in this tuple", because it usually doesn't make any
sense at all (and probably, if it does make sense in a certain case, it's
because the database is badly designed.)

But *now*, you can express that operation in Python. So I'd say that
*now*, the distinction between an "homogeneous container" vs
"heterogeneous data structure" has vanished a lot, and it's hard to
convince people that tuples aren't just immutable lists. That is, *I*
would have used a list in this case:

for delay in (0.01, 0.1, 0.5, 1, 2, 5, 10, 30, 60):
do_something(delay)

but I cannot find a *concrete* reason to support the assertion "list is
better".

So, for practical purposes, tuples act now as if they were immutable lists
-- one should be aware of the different memory allocation strategies, but
I see no other relevant differences.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Staff online

Members online

Forum statistics

Threads
474,289
Messages
2,571,435
Members
48,120
Latest member
Natbelix

Latest Threads

Top