Beginner question - How to effectively pass a large list

J

JCM

Nope.
This is one case where understanding something of the insides of
Python helps. Basically Python variables are dictionary entries.
The variable values are the the dictionary values associated with
the variable names which are the dictionary keys.
Thus when you pass an argument to a function you are passing a
dictionary key. When the function uses the argument it looks up
the dictionary and uses the value found there.
This applies to all sorts of things in Python including modules -
a local dictionary associated with the module, and classes -
another dictionary. Dictionaries are fundamental to how Python
works and memory addresses per se play no part in the procedings.

You're talking about the implementation of the interpreter. I
wouldn't have used the term "memory address" as J.R. did, as this also
implies something about the implementation, but it does make sense to
say object IDs/object references are passed into functions and bound
to names/variables.
 
B

Bengt Richter

Depends on what you mean by "python" ;-) Python the language doesn't pass memory
addresses, but an implementation of Python might very well. The distinction is
important, or implementation features will be misconstrued as language features.

I suspect Alan is trying to steer you away from discussing implementation. I think
it can be useful to talk about both, if the discussion can be plain about which
it is talking about.
IMO that is a little too dismissive ;-)
This is one case where understanding something of the insides of
Python helps. Basically Python variables are dictionary entries.
The variable values are the the dictionary values associated with
the variable names which are the dictionary keys.

Thus when you pass an argument to a function you are passing a
dictionary key. When the function uses the argument it looks up
the dictionary and uses the value found there.
That's either plain wrong or mighty misleading. IOW this glosses
over (not to say mangles) some important details, and not just of
the implementation. E.g., when you write

foo(bar)

you are not passing a "key" (bar) to foo. Yes, foo will get access to
the object indicated by bar by looking in _a_ "dictionary". But
when foo accesses bar, it will be using as "key" the parameter name specified
in the parameter list of the foo definition, which will be found in _another_
"dictionary", i.e., the one defining the local namespace of foo. I.e., if foo is

def foo(x): return 2*x

and we call foo thus
bar = 123
print foo(bar)

What happens is that 'bar' is a key in the global (or enclosing) scope and 'x' is a
key in foo's local scope. Thus foo never sees 'bar', it sees 'x'. It is the job of
the function-calling implementation to bind parameter name x to the same thing as the specified
arg name (bar here) is bound to, before the first line in foo is executed. You could write

globals()['bar'] = 123
print foo(globals['bar'])

and

def foo(x): return locals()['x']*2

to get the flavor of what's happening. Functions would be little more than global-access macros
if it were not for the dynamic of binding local function parameter names to the call-time args.
This applies to all sorts of things in Python including modules -
a local dictionary associated with the module, and classes -
another dictionary. Dictionaries are fundamental to how Python
works and memory addresses per se play no part in the procedings.
Well, ISTM that is contrasting implementation and semantics. IOW, memory addresses may
(and very likely do) or may not play a part in the implementation, but Python
the language is not concerned with that for its _definition_ (though of course implementers
and users are concerned about _implementation_ for performance reasons).

I think the concept of name space is more abstract and more helpful in encompassing the
various ways of finding objects by name that python implements. E.g., when you interactively
type dir(some_object), you will get a list of key names, but typically not from one single dictionary.
There is potentially a complex graph of "dictionaries" to search according to specific rules
defining order for the name in question. Thus one could speak of the whole collection of visible
names in that process as a (complex) name space, or one could speak of a particular dict as
implementing a (simple) name space.

HTH

Regards,
Bengt Richter
 
A

Asun Friere

1. There is no value passed to the default argument
The name "d" is bound to the first element of the f.func_defaults. Since the
function "f" is an
object, which will be kept alive as long as there is name (current is "f")
refered to it, the
list in the func_defaults shall be accumulated by each invoking.

....


I think we could eliminate such accumulation effect by changing the function
as follow:
d = d+[0]
print d

And the reason this eliminates the accumulation is that the assignment
('d = d + [0]') rebinds the name 'd' to the new list object ([] +
[0]), ie. it no longer points to the first value in f.func_defaults.

What surprised me was that the facially equivalent:
.... d += [0]
.... print d
did not do so. Apparently '+=' in regards to lists acts like
list.apppend, rather than as the assignment operator it looks like.
 
J

Jp Calderone

1. There is no value passed to the default argument
The name "d" is bound to the first element of the f.func_defaults. Since the
function "f" is an
object, which will be kept alive as long as there is name (current is "f")
refered to it, the
list in the func_defaults shall be accumulated by each invoking.

...


I think we could eliminate such accumulation effect by changing the function
as follow:
def f(d=[]):
d = d+[0]
print d

And the reason this eliminates the accumulation is that the assignment
('d = d + [0]') rebinds the name 'd' to the new list object ([] +
[0]), ie. it no longer points to the first value in f.func_defaults.

What surprised me was that the facially equivalent:
... d += [0]
... print d
did not do so. Apparently '+=' in regards to lists acts like
list.apppend, rather than as the assignment operator it looks like.

list.extend, to be more precise, or list.__iadd__ to be completely precise
:) The maybe-mutate-maybe-rebind semantics of += lead me to avoid its use
in most circumstances.

Jp
 
G

Greg Ewing (using news.cis.dfn.de)

Paul said:
In those cases the compiler should notice it and generate appropriate
code to evaluate the default arg just once.

How is the compiler supposed to know? In the general case
it requires reading the programmer's mind.
 
D

Dang Griffith

Changes are rarely if ever made to Python for the sole reason
of reducing newbie mistakes. There needs to be a payoff for
long-term use of the language as well.

I strongly concur with this principle. Too often, I've been on a team
that goes to such efforts to make a system easy for newbies to learn
that the normal/advanced users are then handicapped by a dumbed-down
interface. E.g., after you've performed some process enough times, do
you *really* need a step-by-step wizard?

--dang
 
C

Carl Banks

Paul said:
Come on, the compiler can easily recognize that that list is constant.

Yes, but that doesn't account for all expensive parameters. What
about this:

DEFAULT_LIST = ((1,2),(3,4),(5,6),(7,8))

def func(param=DEFAULT_LIST):
pass

Or this:

import external_module

def func(param=external_modules.create_constant_object()):
pass

Or how about this:

def func(param={'1': 'A', '2': 'B', '3': 'C', '4': 'D'}):
pass


The compiler couldn't optimize any of the above cases.

Python takes so many other performance hits for the sake of
convenience and/or clarity that this particular one would be miniscule
by comparison.


Well, I don't have any data, but my gut feeling is this would be
somewhat more than "miniscule" performance hit. Seeing how pervasive
default arguments are, I'm guessing it would be a very significant
slowdown if default arguments had to be evaluated every call.

But since I have no numbers, I won't say anything more about it.
 
B

Bengt Richter

Yes, but that doesn't account for all expensive parameters. What
about this:

DEFAULT_LIST = ((1,2),(3,4),(5,6),(7,8))

def func(param=DEFAULT_LIST):
pass

Or this:

import external_module

def func(param=external_modules.create_constant_object()):
pass

Or how about this:

def func(param={'1': 'A', '2': 'B', '3': 'C', '4': 'D'}):
pass


The compiler couldn't optimize any of the above cases.
For the DEFAULT_LIST (tuple?) and that particular dict literal, why not?
Well, I don't have any data, but my gut feeling is this would be
somewhat more than "miniscule" performance hit. Seeing how pervasive
default arguments are, I'm guessing it would be a very significant
slowdown if default arguments had to be evaluated every call.

But since I have no numbers, I won't say anything more about it.
Don't know if I got this right, but

[18:32] /d/Python23/Lib>egrep -c 'def .*=' *py |cut -d: -f 2|sum
Total = 816
[18:32] /d/Python23/Lib>egrep -c 'def ' *py |cut -d: -f 2|sum
Total = 4454

would seem to suggest pervasive ~ 816/4453
or a little less than 20%

Of course that says nothing about which are typically called in hot loops ;-)
But I think it's a bad idea as a default way of operating anyway. You can
always program call-time evaluations explicitly. Maybe som syntactic sugar
could be arranged, but I think I would rather have some sugar for the opposite
instead -- i.e., being able to code a block of preset locals evaluated and bound
locally like current parameter defaults, but not being part of the call signature.

Regards,
Bengt Richter
 
C

Carl Banks

Bengt said:
For the DEFAULT_LIST (tuple?) and that particular dict literal, why not?


Well, the value of DEFAULT_LIST is not known a compile time (unless, I
suppose, this happens to be in the main module or command prompt).
The literal is not a constant, so the compiler couldn't optimize this.

(Remember, the idea is that default parameters should be evaluated at
call time, which would require the compiler to put the evaluations
inside the function's pseudo-code. The compiler could optimize default
parameters by evaluating them at compile time: but you can only do
that with constants, for obvious reasons.)

Well, I don't have any data, but my gut feeling is this would be
somewhat more than "miniscule" performance hit. Seeing how pervasive
default arguments are, I'm guessing it would be a very significant
slowdown if default arguments had to be evaluated every call.

But since I have no numbers, I won't say anything more about it.
Don't know if I got this right, but

[18:32] /d/Python23/Lib>egrep -c 'def .*=' *py |cut -d: -f 2|sum
Total = 816
[18:32] /d/Python23/Lib>egrep -c 'def ' *py |cut -d: -f 2|sum
Total = 4454

would seem to suggest pervasive ~ 816/4453
or a little less than 20%

Well, if you don't like the particular adjective I used, feel free to
substitute another. This happens a lot to me in c.l.p (see Martelli).
All I'm saying is, default arguments are common in Python code, and
slowing them down is probably going to be a significant performance
hit.

(You probably underestimated a little bit anyways: some functions
don't get to the default arguments until the second line.)

Of course that says nothing about which are typically called in hot
loops ;-) But I think it's a bad idea as a default way of operating
anyway. You can always program call-time evaluations
explicitly. Maybe som syntactic sugar could be arranged, but I think
I would rather have some sugar for the opposite instead -- i.e.,
being able to code a block of preset locals evaluated and bound
locally like current parameter defaults, but not being part of the
call signature.

Well, personally, I don't see much use for non-constant default
arguments, as we have them now, wheras they would be useful if you
could get a fresh copy. And, frankly, the default arguments feel like
they should be evaluated at call time. Now that we have nested
scopes, there's no need for them to simulate closures. So, from a
purely language perspective, I think they ought to be evaluated at
call time.

The only thing is, I very much doubt I'd be willing to take the
performance hit for it.
 
P

Paul Rubin

Carl Banks said:
The only thing is, I very much doubt I'd be willing to take the
performance hit for it.

If you don't like performance hits, you're using the wrong language :).

Seriously, this hasn't been an issue in Common Lisp, which in general
pays far more attention to performance issues than Python ever has.
 
R

Rainer Deyke

Carl said:
Well, personally, I don't see much use for non-constant default
arguments, as we have them now, wheras they would be useful if you
could get a fresh copy.

I disagree completely. If a function modifies one of its arguments, this is
a fundamental property of that function. If using a default value for that
argument causes the function to swallow the changes it performs, the
function no longer has consistent behavior.

On the other hand, the following function has consistent, if silly,
behavior:


default_list = []

def append_5_to_a_list(which_list = default_list):
which_list.append(5)
 
D

Donn Cave

Quoth Jp Calderone <[email protected]>:
| On Thu, Dec 18, 2003 at 03:29:55PM -0800, Asun Friere wrote:
....
|> What surprised me was that the facially equivalent:
|> >>> def f (d=[]) :
|> ... d += [0]
|> ... print d
|> did not do so. Apparently '+=' in regards to lists acts like
|> list.apppend, rather than as the assignment operator it looks like.
|
| list.extend, to be more precise, or list.__iadd__ to be completely precise
| :) The maybe-mutate-maybe-rebind semantics of += lead me to avoid its use
| in most circumstances.

This tacky feature certainly ought to be considered for the
chopping block in version 3, for exactly that reason.

Donn Cave, (e-mail address removed)

PS. Good analysis, JR, I learned something from it.
 
C

Carl Banks

Paul said:
If you don't like performance hits, you're using the wrong language :).

Seriously, this hasn't been an issue in Common Lisp, which in general
pays far more attention to performance issues than Python ever has.

Well, if you're right, I'm all for it.

But I hope the reason you want this is because it's less surprising,
more intuitive, more useful, or whatnot, and not just because Common
Lisp did it that way. Because people who can't see the side of any
computer programming argument except in the context of Common Lisp are
just pathetic.
 
C

Carl Banks

Rainer said:
Carl said:
Well, personally, I don't see much use for non-constant default
arguments, as we have them now, wheras they would be useful if you
could get a fresh copy.

I disagree completely. If a function modifies one of its arguments, this is
a fundamental property of that function. If using a default value for that
argument causes the function to swallow the changes it performs, the
function no longer has consistent behavior.

On the other hand, the following function has consistent, if silly,
behavior:


default_list = []

def append_5_to_a_list(which_list = default_list):
which_list.append(5)


I said a non-constant default argument wasn't useful. As evidence
against, you suggest that the function is "consistent."

Now, do have any evidence that non-constant, default arguments (as
they are now) are USEFUL?

(I don't agree with your "consistent" theory, anyways. The function
would be treating all arguments consistently: it's just that it'd get
a fresh copy of the default arguments each call.)
 
R

Rainer Deyke

Carl said:
Now, do have any evidence that non-constant, default arguments (as
they are now) are USEFUL?

def draw_pixel(x, y, color, surface=screen):
screen[y][x] = color
(I don't agree with your "consistent" theory, anyways. The function
would be treating all arguments consistently: it's just that it'd get
a fresh copy of the default arguments each call.)

A function that mutates its arguments should not be called with "fresh"
arguments, implicitly or explicitly. If the purpose of the function is to
modify its arguments, then doing so would throw away the effect of the
function. If the purpose of the function is not to modify its arguments,
then it shouldn't do so.
 
R

Rainer Deyke

Rainer said:
def draw_pixel(x, y, color, surface=screen):
screen[y][x] = color

This should of course be:

def draw_pixel(x, y, color, surface=screen):
surface[y][x] = color
 
P

Paul Rubin

Carl Banks said:
But I hope the reason you want this is because it's less surprising,
more intuitive, more useful, or whatnot, and not just because Common
Lisp did it that way. Because people who can't see the side of any
computer programming argument except in the context of Common Lisp are
just pathetic.

It avoids the need for ridiculous kludges to check whether there is a
real arg there or not, etc. I'd prefer that Python had used the CL
method in the first place since I find the Python method bizarre and
counterintuitive. However, changing it now would introduce
incompatibility that's harder to justify. So we probably have to live
with it.
 
B

Bengt Richter

Well, the value of DEFAULT_LIST is not known a compile time (unless, I
suppose, this happens to be in the main module or command prompt).
The literal is not a constant, so the compiler couldn't optimize this.

Well, according to the argument, we would be dealing with an optimizing compiler,
so presumably the compiler would see a name DEFAULT_LIST and simply compile a
call-time binding of param to whatever DEFAULT_LIST was bound to, and not bother
further. It could notice that the DEFAULT_LIST binding was still undisturbed, and
that it was to an immutable tuple with no mutable elements, which ISTM is effectively
a constant, but that analysis would be irrelevant, since the semantics would be
copying pre-existing binding (which is pretty optimized anyway).

The dict literal looks to me to be made up entirely of immutable keys and values, so
the value of that literal expression seems to me to be a constant. If you had call time
evaluation, you would be evaluating that expression each time, and the result would be
a fresh mutable dict with that constant initial value each time. ISTM that could be
optimized as param=private_dict_compile_time_created_from_literal.copy().
OTOH, if you used a pre-computed binding like DEFAULT_LIST, and wrote

SHARED_DICT = {'1': 'A', '2': 'B', '3': 'C', '4': 'D'}
def func(param=SHARED_DICT):
pass

then at def-time the compiler would not see the literal, but rather a name bound to
a mutable dict instance. The call-time effect would be to bind param to whatever SHARED_DICT
happened to be bound to, just like for DEFAULT_LIST. But the semantics, given analysis that
showed no change to the SHARED_DICT _binding_ before the func call, would be to share a single
mutable dict instance. This is unlike the semantics of

def func(param={'1': 'A', '2': 'B', '3': 'C', '4': 'D'}):
pass

which implies a fresh mutable dict instance bound to param, with the same initial value
(thus "constant" in a shallow sense at least, which in this case is fully constant).
(Remember, the idea is that default parameters should be evaluated at
call time, which would require the compiler to put the evaluations
inside the function's pseudo-code. The compiler could optimize default
parameters by evaluating them at compile time: but you can only do
that with constants, for obvious reasons.)
Yes, but note the difference between evaluating a name and a fixed-value literal expression,
as noted above.
Well, I don't have any data, but my gut feeling is this would be
somewhat more than "miniscule" performance hit. Seeing how pervasive
default arguments are, I'm guessing it would be a very significant
slowdown if default arguments had to be evaluated every call.

But since I have no numbers, I won't say anything more about it.
Don't know if I got this right, but

[18:32] /d/Python23/Lib>egrep -c 'def .*=' *py |cut -d: -f 2|sum
Total = 816
[18:32] /d/Python23/Lib>egrep -c 'def ' *py |cut -d: -f 2|sum
Total = 4454

would seem to suggest pervasive ~ 816/4453
or a little less than 20%

Well, if you don't like the particular adjective I used, feel free to
substitute another. This happens a lot to me in c.l.p (see Martelli).
Sorry, I didn't mean make anything "happen to" you, especially if it was unpleasant ;-)
I just meant to pick up on "pervasive" and "numbers" and try to provide some anecdotal data.
All I'm saying is, default arguments are common in Python code, and
slowing them down is probably going to be a significant performance
hit.
Probably in specific cases, but other cases could have no hit at all, given optimization.
(You probably underestimated a little bit anyways: some functions
don't get to the default arguments until the second line.) Agreed.


Well, personally, I don't see much use for non-constant default
arguments, as we have them now, wheras they would be useful if you
could get a fresh copy. And, frankly, the default arguments feel like
they should be evaluated at call time. Now that we have nested
scopes, there's no need for them to simulate closures. So, from a
purely language perspective, I think they ought to be evaluated at
call time.
I'd worry a bit about the meaning of names used in initialization expressions
if their values are to be looked up at call time. E.g., do you really want

a = 2
def foo(x=a): print 'x =', x
...
...
a = 'eh?'
foo()

to print 'eh?' By the time you are past a lot of ...'s, ISTM the code intent is not
so clear. But you can make dynamic access to the current a as a default explicit by

class Defer(object):
def __init__(self, lam): self.lam = lam

def foo(x=Defer(lambda:a)):
if isinstance(x, Defer): x=x.lam()
print 'x =', x

The semantics are different. I'd prefer to have the best of both worlds and be able
to do both, as now, though I might not object to some nice syntactic sugar along the
lines suggested by OP Stian Søiland. E.g., short spelling for the above Defer effect:

def foo(x:a): print 'x =', x
The only thing is, I very much doubt I'd be willing to take the
performance hit for it.
Moore+PyPy => less worry about that in future, I think.

Regards,
Bengt Richter
 
B

Bengt Richter

I said a non-constant default argument wasn't useful. As evidence
against, you suggest that the function is "consistent."

Now, do have any evidence that non-constant, default arguments (as
they are now) are USEFUL?

From zenmeister Tim Peter's fixedpoint module (argue with that ;-):

def _tento(n, cache={}):
try:
return cache[n]
except KeyError:
answer = cache[n] = 10L ** n
return answer

That's useful, but IMO not the optimal syntax, because cache here is really not
being used as a parameter. That's why I would like a way to bind locals similarly
without being part of the calling signature. E.g.,

def _tento(n)(
# preset bindings evaluated here at def-time
cache={}
):
try:
return cache[n]
except KeyError:
answer = cache[n] = 10L ** n
return answer

Regards,
Bengt Richter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,175
Messages
2,570,942
Members
47,489
Latest member
BrigidaD91

Latest Threads

Top