Bizarre behavior with mutable default arguments

B

bukzor

I've found some bizzare behavior when using mutable values (lists,
dicts, etc) as the default argument of a function. I want to get the
community's feedback on this. It's easiest to explain with code.

This example is trivial and has design issues, but it demonstrates a
problem I've seen in production systems:

def main(argv = ['TESTING']):
'print out arguments with BEGIN and END'
argv.insert(1, "BEGIN")
argv.append("END")
for arg in argv: print arg

if __name__ == '__main__':
from sys import argv, exit
exit(main(argv))


Now lets try this out at the terminal:
BEGIN
TESTING
END
BEGIN
TESTING
ENDBEGIN
BEGIN
TESTING
END
END
example.main(["TESTING"])
BEGIN
TESTING
END

The function does different things if you call it with ["TESTING"] as
the argument, even though that is identical to the default value!! It
seems that default values are only allocated once. If the default
value is mutable and is changed during the function's execution, this
has the side effect of making the default value change on each
subsequent execution.

Is this functionality intended? It seems very unintuitive. This has
caused a bug in my programs twice so far, and both times I was
completely mystified until I realized that the default value was
changing.

I'd like to draw up a PEP to remove this from py3k, if I can get some
community backing.


--Buck
 
M

Martin v. Löwis

Is this functionality intended?

Google for "Python mutable default arguments" (you can actually
leave out Python).

It's part of the language semantics, yes.

Regards,
Martin
 
I

Istvan Albert

Is this functionality intended? It seems very unintuitive. This has
caused a bug in my programs twice so far, and both times I was
completely mystified until I realized that the default value was
changing.

it is only unintuitive when you do not know about it

once you realize how it works and what it does it can actually be very
useful

i.
 
B

bukzor

Here's the answer to the question:
http://www.python.org/doc/faq/general/#why-are-default-values-shared-between-objects

It looks like Guido disagrees with me, so the discussion is closed.




For the record, I still think the following would be an improvement to
py3k:

In python25:
def f(a=None):
if a is None: a = []
...

In py3k becomes:
def f(a=[])
...


In python25 (this function from the FAQ linked above):
def f(a, _cache={}):
# Callers will never provide a third parameter for this function.
(then why is it an argument?)
...

In py3k becomes:
_cache = {}
def f(a):
global _cache
...



This follows the "explicit is better" and "one best way" principles of
Python, and greatly improves the intuitiveness. Also since the first
example is much more common, it reduces the overall verbosity of the
language.

Just my parting two cents,
--Buck
 
S

Steven D'Aprano

In python25 (this function from the FAQ linked above):
def f(a, _cache={}):
# Callers will never provide a third parameter for this function.
(then why is it an argument?)

The caller might want to provide it's own pre-prepared cache. Say, for
testing.

I think that this behaviour is a little unintuitive, and by a little I
mean a lot. Nevertheless, I am used to it, and I don't see any reason to
change it. There's very little about programming that is intuitive --
there's no intuitive reason to think that dictionary lookups are O(1)
while list lookups are O(n).

In the absence of a better solution, I'm very comfortable with keeping
the behaviour as is. Unfortunately, there's no good solution in Python to
providing functions with local storage that persists across calls to the
function:


(1) Use a global variable.

cache = {}
def foo():
global cache
print cache


(2) Use a function attribute.

def foo():
print foo.cache
foo.cache = {}


def foo():
try:
foo.cache
except AttributeError:
foo.cache = {}
print foo.cache



(3) Use an argument that isn't actually an argument.

def foo(cache={}):
print cache


#1, the global variable, is probably the worst solution of the lot.
Global variables are rightly Considered Harmful.


#2 has the disadvantages that you initialize the value *after* you write
the code that relies on it. Either that, or you waste time on every call
checking to see it if has been initialized. Also, like recursive
functions, it is difficult to rename the function.


#3 is, I think, the least-worse solution, but I would hardly call it
ideal.


_cache = {}
def f(a):
global _cache
...

This follows the "explicit is better" and "one best way" principles of
Python,

Declaring an argument is equally explicit.

And you are confused -- the Zen doesn't say "one best way". People so
often get it wrong.

The Zen says:

"There should be one-- and PREFERABLY only one --OBVIOUS way to do it."
(Emphasis added.)

At least you're not saying "there should be only one way to do it". I
give you credit for that!

and greatly improves the intuitiveness. Also since the first
example is much more common, it reduces the overall verbosity of the
language.

I question that it is "much more common". How do you know? Where's your
data?
 
S

Steven D'Aprano

I've found some bizzare behavior when using mutable values (lists,
dicts, etc) as the default argument of a function.

This FAQ is so Frequently Asked that I sometimes wonder if Python should,
by default, print a warning when it compiles a function with a list or
dict as as default value.

There's precedence for such a thing: the sum() built-in (un)helpfully
raises an exception if you try to use it on strings.

I say unhelpfully because the one time I wanted to use sum() on strings
was specifically to demonstrate the difference between O(n**2) behaviour
and O(n). I was quite put out that Python, which normally allows you to
shoot yourself in the foot if you insist, was so unnecessarily protective
in this case. Give me a warning, if you wish, but don't stop me.
 
B

bukzor

I think that this behaviour is a little unintuitive, and by a little I
mean a lot.

Thanks for acknowledging it.
I question that it is "much more common". How do you know? Where's your
data?

I did a dumb grep of my Python25/Lib folder and found 33 occurances of
the first pattern above. (Use None as the default value, then check
for None and assign empty list/dict)

Although I spent at least double the amount of time looking for the
second pattern, I found no occurances. (Use dict/list as default value
and modify it in place.) Every single function that used a list or
dict as a default value treated these variables as read-only. However,
I did find two new ways to accomplish the above (further violating the
Zen).

/c/Python25/Lib/site-packages/wx-2.8-msw-ansi/wx/lib/
customtreectrl.py:
def FillArray(self, item, array=[]):
if not array:
array = []

/c/Python25/Lib/site-packages/wx-2.8-msw-ansi/wx/lib/floatcanvas/
FloatCanvas.py:
def __init__(self, ObjectList=[], InForeground = False, IsVisible =
True):
self.ObjectList = list(ObjectList)

--Buck
 
B

bukzor

Just for completeness, the mutable default value problem also affects
classes:

class c:
def __init__(self, list = []):
self.list = list
self.list.append("LIST END")
def __repr__(self):
return said:
import example2
print example2.c()
print example2.c([])
print example2.c()
print example2.c([])
<Class a: ['LIST END']>

Again, we get different results if we supply an argument that is
identical to the default value. There are many instances in the
standard library where class values are assigned directly from the
initializer, which has list or dict default values, so there is chance
for errors cropping up here.

The error scenario is this:
1. Use a mutable value as default value in a class constructor.
2. Assign class property from constructor arguments.
3. Instantiate class using default value.
4. Modify class property in place.
5. Instantiate (again) class using default value.

The second instance will behave strangely because data from the first
instance has leaked over. The standard library is not affected because
it avoids one of these five steps. Most classes simply don't have
mutable default values (1). Those that do generally treat them as read-
only (4). Some classes are not useful using the default values (3).
Some classes are not useful to be instantiated twice (5). The classes
that don't avoid the problem at one of these four steps have to avoid
it at (2) by using one of the three above patterns.

--Buck
 
J

John Machin

Just for completeness, the mutable default value problem also affects
classes:

Simply, because methods are functions, and can have default arguments.
You don't need to nail *another* zillion theses to the cathedral
door :)
 
T

thebjorn

it is only unintuitive when you do not know about it

once you realize how it works and what it does it can actually be very
useful

i.

I agree it is a potentially useful feature, yet it can still bite you
even after a decade of Python... Scenario: long running server
process, Bug report: "people aren't getting older", Code:

def age(dob, today=datetime.date.today()):
...

None of my unit tests caught that one :)

-- bjorn
 
I

Istvan Albert

def age(dob, today=datetime.date.today()):
...

None of my unit tests caught that one :)

interesting example I can see how it caused some trouble. A quick fix
would be to write it:

def age(dob, today=datetime.date.today ):

and inside the definition invoke it as today() rather than just today.
That way it still keeps the original spirit of the definition.

i.
 
I

Istvan Albert

The standard library is not affected because

the people who wrote code into it know how python works.

Programming abounds with cases that some people think should work
differently:

a = b = []
a.append(1)

is b empty or not at this point? Get informed, remember the rules, be
happy and move on to write some cool code.

There is little new in what you say. Every so often someone is having
a confusing time with a feature and therefore proposes that the
language be changed to match his/her expectations.

i.
 
G

George Sakkis

Here's the answer to the question:http://www.python.org/doc/faq/general/#why-are-default-values-shared-...

It looks like Guido disagrees with me, so the discussion is closed.

Note that the FAQ mainly explains *what* happens, not *why* was this
decision taken. Although it shows an example where "this feature can
be useful", it's neither the only way to do it nor is memoization as
common as wanting fresh default arguments on every call.
For the record, I still think the following would be an improvement to
py3k:

In python25:
def f(a=None):
if a is None: a = []
...

In py3k becomes:
def f(a=[])
...

In python25 (this function from the FAQ linked above):
def f(a, _cache={}):
# Callers will never provide a third parameter for this function.
(then why is it an argument?)
...

In py3k becomes:
_cache = {}
def f(a):
global _cache
...

This follows the "explicit is better" and "one best way" principles of
Python, and greatly improves the intuitiveness. Also since the first
example is much more common, it reduces the overall verbosity of the
language.

I'm with you on this one; IMHO it's one of the relatively few language
design missteps of Python, favoring the rare case as the default
instead of the common one. Unfortunately, many Pythoneers become so
immersed with the language and whatever the current status quo is that
they rarely question the rationale of the few counter-intuitive design
choices.

George
 
T

thebjorn

interesting example I can see how it caused some trouble. A quick fix
would be to write it:

def age(dob, today=datetime.date.today ):

and inside the definition invoke it as today() rather than just today.
That way it still keeps the original spirit of the definition.

i.

The purpose of the argument was to be able to calculate the age at a
given point in time -- i.e. was the person 18 y/o at the time of the
incident?

Our coding standard now dictates:

def foo(arg=None):
if arg is None:
arg = <default mutable value>

(unless there's a very good reason to do it otherwise :)

a close runner-up, that we specifically opted not to allow was

def foo(arg={}):
arg = arg or {}

even though it looks sexy and it's perhaps a bit more self-documenting
in some IDEs, it was rejected because it prevents "false" overrides of
the default argument.

For purely practical reasons we couldn't consider

def foo(arg=None):
arg = <default mutable value> if arg is None else arg

-- bjorn
 
B

bukzor

Scenario: long running server process,
Bug report: "people aren't getting older", Code:

def age(dob, today=datetime.date.today()):
...

A very interesting example, thanks.


Just because it's well known doesn't mean we shouldn't think about it.
For example, in the same list you linked, "3. Integer division" is
being fixed in py3k.


I'm with you on this one; IMHO it's one of the relatively few language
design missteps of Python, favoring the rare case as the default
instead of the common one.

Well put. Although I've seen 'potentially useful' often used about
this, I havn't found an instance of its use in production code.

Our coding standard now dictates:

def foo(arg=None):
if arg is None:
arg = <default mutable value>

(unless there's a very good reason to do it otherwise :)

I believe this is very similar to most people's coding practices. It's
the one I adopted, it's in the above FAQ, the above 'gotchas' list,
and it's all over the standard library. This is a case of a language
feature requiring coding practices to prevent bugs, with very little
value added elsewhere.

--Buck
 
I

Istvan Albert

I'm with you on this one; IMHO it's one of the relatively few language
design missteps of Python, favoring the rare case as the default
instead of the common one.

George, you pointed this out this link in a different thread

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877

how would you rewrite the code below if you could not use mutable
default arguments (global variables not accepted)? Maybe there is a
way, but I can't think of it as of now.

---------------------------------------

def blocks(s, start, end):
def classify(c, ingroup=[0]):
klass = c==start and 2 or c==end and 3 or ingroup[0]
ingroup[0] = klass==1 or klass==2
return klass
return [tuple(g) for k, g in groupby(s, classify) if k == 1]

print blocks('the {quick} brown {fox} jumped', start='{', end='}')
 
B

bukzor

I'm with you on this one; IMHO it's one of the relatively few language
design missteps of Python, favoring the rare case as the default
instead of the common one.

George, you pointed this out this link in a different thread

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/521877

how would you rewrite the code below if you could not use mutable
default arguments (global variables not accepted)? Maybe there is a
way, but I can't think of it as of now.

---------------------------------------

def blocks(s, start, end):
def classify(c, ingroup=[0]):
klass = c==start and 2 or c==end and 3 or ingroup[0]
ingroup[0] = klass==1 or klass==2
return klass
return [tuple(g) for k, g in groupby(s, classify) if k == 1]

print blocks('the {quick} brown {fox} jumped', start='{', end='}')

Extremely simple

def blocks(s, start, end):
ingroup=[0]
def classify(c):
klass = c==start and 2 or c==end and 3 or ingroup[0]
ingroup[0] = klass==1 or klass==2
return klass
return [tuple(g) for k, g in groupby(s, classify) if k == 1]

print blocks('the {quick} brown {fox} jumped', start='{', end='}')


No globals, as you specified. BTW, it's silly not to 'allow' globals
when they're called for, otherwise we wouldn't need the 'global'
keyword.

--Buck
 
I

Istvan Albert

No globals, as you specified. BTW, it's silly not to 'allow' globals
when they're called for, otherwise we wouldn't need the 'global'
keyword.

okay, now note that you do not actually use the ingroup list for
anything else but getting and setting its first element. So why would
one really need it be a list? Let's replace it with a variable called
ingroup that is not a list anymore. See it below (run it to see what
happens):

----------------------

def blocks(s, start, end):
ingroup = 0
def classify(c):
klass = c==start and 2 or c==end and 3 or ingroup
ingroup = klass==1 or klass==2
return klass
return [tuple(g) for k, g in groupby(s, classify) if k == 1]

print blocks('the {quick} brown {fox} jumped', start='{', end='}')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,979
Messages
2,570,185
Members
46,728
Latest member
FernMcmull

Latest Threads

Top