Why do class methods always need 'self' as the first parameter?

I

Ian Kelly

I personally consider this to be a wart. Some time ago I did an
implementation analysis. The gist is that, if self and cls were made
special variables that returned the current instance and class
respectively, then the compiler could determine whether a function was
an instance or class method. If it then marked the code object
appropriately you could get rid of all of the wrappers and the
attendant run-time overhead.

I don't see how you could get rid of the wrappers. Methods would
still need to be bound, somehow, so that code like this will work:

methods = {}
for obj in objs:
if obj.is_flagged:
methods[obj.user_id] = obj.do_work
else:
methods[obj.user_id] = obj.do_other_work
# ...
methods[some_user_id]()

Without method wrappers, how does the interpreter figure out which
instance is bound to the method being called?

Cheers,
Ian
 
J

John Roth

I personally consider this to be a wart. Some time ago I did an
implementation analysis. The gist is that, if self and cls were made
special variables that returned the current instance and class
respectively, then the compiler could determine whether a function was
an instance or class method. If it then marked the code object
appropriately you could get rid of all of the wrappers and the
attendant run-time overhead.

I don't see how you could get rid of the wrappers.  Methods would
still need to be bound, somehow, so that code like this will work:

methods = {}
for obj in objs:
    if obj.is_flagged:
        methods[obj.user_id] = obj.do_work
    else:
        methods[obj.user_id] = obj.do_other_work
# ...
methods[some_user_id]()

Without method wrappers, how does the interpreter figure out which
instance is bound to the method being called?

Cheers,
Ian

Good question.

Currently the instance wrapper is created during method instantiation,
so the instance is obviously available at that point. There are two
rather obvious ways of remembering it. One is to use the invocation
stack, which has the instance. Another would be for the compiler to
create a local variable for the instance and possibly the class and
fill them in at instantiation time. Both of these require fixing the
names "self" and "cls" so the compiler knows what to do with them. The
first would require giving these two names their own bytecodes, the
second makes them simple local variables the same as the ones in the
method headers. The latter also allows them to be changed by the
method, which is probably not the world's best programming practice
although it's possible now.

John Roth
 
I

Ian Kelly

I don't see how you could get rid of the wrappers.  Methods would
still need to be bound, somehow, so that code like this will work:

methods = {}
for obj in objs:
    if obj.is_flagged:
        methods[obj.user_id] = obj.do_work
    else:
        methods[obj.user_id] = obj.do_other_work
# ...
methods[some_user_id]()

Without method wrappers, how does the interpreter figure out which
instance is bound to the method being called?

Cheers,
Ian

Good question.

Currently the instance wrapper is created during method instantiation,
so the instance is obviously available at that point. There are two
rather obvious ways of remembering it. One is to use the invocation
stack, which has the instance. Another would be for the compiler to
create a local variable for the instance and possibly the class and
fill them in at instantiation time. Both of these require fixing the
names "self" and "cls" so the compiler knows what to do with them. The
first would require giving these two names their own bytecodes, the
second makes them simple local variables the same as the ones in the
method headers. The latter also allows them to be changed by the
method, which is probably not the world's best programming practice
although it's possible now.

That's not what I asked. Both of those options are storing the
instance within a stack frame, once it's been called. I'm asking how
you would remember the instance during the interval from the time when
the method
is accessed until when it has been called.

In the code above, the method is accessed just before it is stored in
the dictionary. That is when the method wrapper is currently created,
and the instance is available. It is not called until much later,
possibly not even within the same function. How would you remember
the instance over that period without wrapping the function?

Cheers,
Ian
 
J

John Roth

I don't see how you could get rid of the wrappers.  Methods would
still need to be bound, somehow, so that code like this will work:
methods = {}
for obj in objs:
    if obj.is_flagged:
        methods[obj.user_id] = obj.do_work
    else:
        methods[obj.user_id] = obj.do_other_work
# ...
methods[some_user_id]()
Without method wrappers, how does the interpreter figure out which
instance is bound to the method being called?
Cheers,
Ian
Good question.
Currently the instance wrapper is created during method instantiation,
so the instance is obviously available at that point. There are two
rather obvious ways of remembering it. One is to use the invocation
stack, which has the instance. Another would be for the compiler to
create a local variable for the instance and possibly the class and
fill them in at instantiation time. Both of these require fixing the
names "self" and "cls" so the compiler knows what to do with them. The
first would require giving these two names their own bytecodes, the
second makes them simple local variables the same as the ones in the
method headers. The latter also allows them to be changed by the
method, which is probably not the world's best programming practice
although it's possible now.

That's not what I asked.  Both of those options are storing the
instance within a stack frame, once it's been called.  I'm asking how
you would remember the instance during the interval from the time when
the method
is accessed until when it has been called.

In the code above, the method is accessed just before it is stored in
the dictionary.  That is when the method wrapper is currently created,
and the instance is available.  It is not called until much later,
possibly not even within the same function.  How would you remember
the instance over that period without wrapping the function?

Cheers,
Ian

I see what you're saying now - I didn't get your example the first
time. So the optimization of eliminating the instance wrapper is only
possible if it's retrieved via the instance and then called
immediately. That would seem to be a useful optimization if it was
possible - I wonder if PyPy is doing it since they've got that fancy
JIT, and it would seem that an immediate call after retrieving the
method is overwhelmingly more frequent than saving it for later.

I think it's still true that calling the underlying function object
through the instance wrapper requires remaking the parameter list,
which seems to be another piece of unnecessary overhead, unless
there's a fast path through the call machinery that treats the
instance specially.

It does, however, decouple the two issues so I can't claim the
optimization as a benefit. Drat.

John Roth
 
P

Piet van Oostrum

[snip]
Instead, we have a syntax where you, the programmer, write out the
name of the local variable that binds to the first parameter. This
means the first parameter is visible. Except, it is only visible
at the function definition -- when you have the instance and call
the instance or class method:

black_knight = K()
black_knight.meth1('a', 1)
black_knight.meth2(2)

the first parameters (black_knight, and black_knight.__class__,
respectively) are magic, and invisible.

Thus, Python is using the "explicit is better than implicit" rule
in the definition, but not at the call site. I have no problem with
this. Sometimes I think implicit is better than explicit. In this
case, there is no need to distinguish, at the calls to meth1() and
meth2(), as to whether they are "class" or "instance" methods. At
the *calls* they would just be distractions.

It *is* explicit also at the call site. It only is written at the left of the dot rather than at the right of the parenthesis. And that is necessary to locate which definition of the method applies. It would be silly to repeat this information after the parenthesis. Not only silly, it would be stupid as it would be a source of errors, and an example of DRY.
 
C

Chris Torek

Chris Torek said:
[snip]
when you have [an] instance and call [an] instance or class method:

[note: I have changed the names very slightly here, and removed
additional arguments, on purpose]

It *is* explicit also at the call site. It only is written at the left
of the dot rather than at the right of the parenthesis.

It cannot possibly be explicit. The first parameter to one of the
method functions is black_knight, but the first parameter to the
other method is black_knight.__class__.

Which one is which? Is spam() the instance method and eggs() the
class method, or is spam() the class method and eggs the instance
method? (One does not, and should not, have to *care*, which is
kind of the point here. :) )
And that is necessary to locate which definition of the method
applies.

By "that" I assume you mean the name "black_knight" here. But the
name is not required to make the call; see the last line of the
following code fragment:

funclist = []
...
black_knight = K()
funclist.append(black_knight.spam)
funclist.append(black_knight.eggs)
...
# At this point, let's say len(funclist) > 2,
# and some number of funclist entries are ordinary
# functions that have no special first parameter.
random.choice(funclist)()
It would be silly to repeat this information after the parenthesis.
Not only silly, it would be stupid as it would be a source of errors,
and an example of DRY.

Indeed. But I believe the above is a demonstration of how the
"self" or "cls" parameter is in fact implicit, not explicit.

(I am using python 2.x, and doing this in the interpreter:

random.choice(funclist)

-- without the parentheses to call the function -- produces:

<bound method K.[name omitted] of <__main__.K object at 0x249f50>>
<bound method type.[name omitted] of <class '__main__.K'>>
<function ordinary at 0x682b0>

The first is the instance method, whose name I am still keeping
secret; the second is the class method; and the third is the ordinary
function I added to the list. The actual functions print their
own name and their parameters if any, and one can see that the
class and instance methods get one parameter, and the ordinary
function gets none.)
 
R

rantingrick

I’m new to Python, and I love it.  The philosophy of the language (and
of the community as a whole) is beautiful to me.

Welcome aboard mate!
But one of the things that bugs me

Oh here we go! :)
is the requirement that all class
methods have 'self' as their first parameter.  On a gut level, to me
this seems to be at odds with Python’s dedication to simplicity.

It will will seem odd at first. I too hated typing all those "selfs"
all the time. But believe me my new friend in no time those selfs will
roll of your fingers with great ease. You'll forget how much you hate
them and find much more to complain about.

Like for instance: I really lament the missing redundancy of Explicit
Lexical Scoping in python. For me global variables should have to be
qualified.
For example, consider Python’s indent-sensitive syntax.  
[...]
and the result was a significantly improved
signal-to-noise ratio in the readability of Python code.

Yes, forced indention is my favorite aspect of Python!
So why is 'self' necessary on class methods?

It could be that Guido has a exaggerated self importance and just
liked the sound of all those selfs whist reading source code. However
i believe the real reason is really readability! It takes a while to
understand this aspect because the natural human response is to be
lazy (for instance i could have used "used to" in the previous
sentence if i was slothful). We are all inherently lazy beings who
need structure to keep us from spiraling out of control into the abyss
of selfishness.

GvR: Computer Scientist and Behavioral psychologist.
 
S

Steven D'Aprano

It cannot possibly be explicit. The first parameter to one of the
method functions is black_knight, but the first parameter to the
other method is black_knight.__class__.


I think you are expecting more explicitness than actually required. There
are degrees of explicitness:

- The current President of the United States is a black man.

- On 6th September 2011, the duly constituted President of the United
States of America is a black man.

- On 6th September 2011, the duly constituted government official with
the title of President of the nation known as the United States of
America is an individual member of the species Homo sapiens with XY
chromosomes and of recent African ancestry.

As opposed to implicit:

- He is a black guy.


There is no requirement for every last gory detail to be overtly specified
in full. I quote from WordNet:

explicit
adj 1: precisely and clearly expressed or readily observable;
leaving nothing to implication; "explicit
instructions"; "she made her wishes explicit";
"explicit sexual scenes" [syn: expressed] [ant: implicit]
2: in accordance with fact or the primary meaning of a term
[syn: denotative]

Note the second definition in particular: in accordance with the primary
meaning of a term: the primary meaning of "class method" is that it
receives the class rather than the instance as first argument.

The "explicit is better than implicit" Zen should, in my opinion, be best
understood as a recommendation that code should, in general, avoid getting
input from context. In general, functions should avoid trying to decide
which behaviour is wanted according to context or the environment:

def func(x):
if running_in_a_terminal():
print "The answer is", (x+1)/2
else:
printer = find_a_printer()
if printer is not None:
printer.send((x+1)/2, header="func(%r)"%x, footer="Page 1")
else:
# Try sending email to the current user, the default user,
# postmaster or root in that order.
msg = make_email("The answer is", (x+1)/2)
for user in [get_current_user(), DEFAULT_USER,
"(e-mail address removed)", ...]:
result = send_mail(msg, to=user)
if result == 0: break
else:
# Fall back on beeping the speakers in Morse code
...

(what if I want to beep the speakers from the terminal?), but not as a
prohibition against code like this:

def factorial(x):
# Return the factorial of the integer part of x.
n = int(x)
if n <= 1: return 1
return n*factorial(n-1)


There's no need to require the user to explicitly call int(x) before calling
factorial just to satisfy the Zen.

A function is free to process arguments as required, even to throw out
information (e.g. float -> int, instance -> class). What it shouldn't do is
*add* information implied by context (or at least, it should be very
cautious in doing so, and document it carefully, and preferably allow the
caller to easily override such implied data).

Which one is which? Is spam() the instance method and eggs() the
class method, or is spam() the class method and eggs the instance
method? (One does not, and should not, have to *care*, which is
kind of the point here. :) )

You can't tell just from the syntax used to call them:

function(arg)
bound_method(arg)
builtin_function_or_method(arg)
callable_instance(arg)
type(arg)

all use the same syntax. There is no requirement that you should be able to
tell *everything* about a line of code just from the syntax used. If you
want to know whether black_knight.spam is an instance method or a class
method, or something else, use introspection to find out.
By "that" I assume you mean the name "black_knight" here. But the
name is not required to make the call; see the last line of the
following code fragment:

funclist = []
...
black_knight = K()
funclist.append(black_knight.spam)
funclist.append(black_knight.eggs)
...
# At this point, let's say len(funclist) > 2,
# and some number of funclist entries are ordinary
# functions that have no special first parameter.
random.choice(funclist)()


Irrelevant. The instance used for the bound method is explicitly specified
when you create it. But there's no requirement that you need to explicitly
specify the instance every single time you *use* the bound method. The
instance is part of the bound method, it isn't implied by context or
history or guessed from the environment.

Contrast what Python actually does with a hypothetical language where bound
methods' instances are implicitly assigned according to the most recent
instance created:

black_knight = Knight()
funclist.append(spam) # refers to black_knight.spam
white_knight = Knight()
funclist.append(spam) # refers to white_knight.spam
baldrick = Knave()
funclist.append(eggs) # oops, Knaves don't have an eggs attribute
 
P

Piet van Oostrum

Prasad said:
It seems to me that if I add a function to the list of class attributes it will automatically wrap with "self" but adding it to the object directly will not wrap the function as a method. Can somebody explain why? I would have thought that any function added to an object would be a method (unless decorated as a class method).

The special magic to transform a function into a method is only applied
for functions found as attributes of the class, not for instance
attributes. It is a matter of design.
Hmm, or does the decoration just tell Python not to turn an object's function into a method? I.e. Is the decorator basically just the syntactic sugar for doing the above?

The classmethod decorator transforms the method (or actually the
function) into a different kind of object (a class method).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,159
Messages
2,570,879
Members
47,414
Latest member
GayleWedel

Latest Threads

Top