Variable arguments (*args, **kwargs): seeking elegance

J

John Ladasky

Hi folks,

I'm trying to make some of Python class definitions behave like the ones I find in professional packages, such as Matplotlib. A Matplotlib class can often have a very large number of arguments -- some of which may be optional, some of which will assume default values if the user does not override them, etc.

I have working code which does this kind of thing. I define required arguments and their default values as a class attribute, in an OrderedDict, so that I can match up defaults, in order, with *args. I'm using set.issuperset() to see if an argument passed in **kwargs conflicts with one which was passed in *args. I use set.isdisjoint() to look for arguments in **kwargs which are not expected by the class definition, raising an error if such arguments are found.

Even though my code works, I'm finding it to be a bit clunky. And now, I'mwriting a new class which has subclasses, and so actually keeps the "extra" kwargs instead of raising an error... This is causing me to re-evaluate my original code.

It also leads me to ask: is there a CLEAN and BROADLY-APPLICABLE way for handling the *args/**kwargs/default values shuffle that I can study? Or is this sort of thing too idiosyncratic for there to be a general method?

Thanks for any pointers!
 
P

Peter Otten

John said:
Hi folks,

I'm trying to make some of Python class definitions behave like the ones I
find in professional packages, such as Matplotlib. A Matplotlib class can
often have a very large number of arguments -- some of which may be
optional, some of which will assume default values if the user does not
override them, etc.

Personally, I'd rather not copy that kind of interface.
I have working code which does this kind of thing. I define required
arguments and their default values as a class attribute, in an
OrderedDict, so that I can match up defaults, in order, with *args. I'm
using set.issuperset() to see if an argument passed in **kwargs conflicts
with one which was passed in *args. I use set.isdisjoint() to look for
arguments in **kwargs which are not expected by the class definition,
raising an error if such arguments are found.

Why do you rely on a homebrew solution instead of actually calling the
function or initializer?
Even though my code works, I'm finding it to be a bit clunky. And now,
I'm writing a new class which has subclasses, and so actually keeps the
"extra" kwargs instead of raising an error... This is causing me to
re-evaluate my original code.

It also leads me to ask: is there a CLEAN and BROADLY-APPLICABLE way for
handling the *args/**kwargs/default values shuffle that I can study? Or
is this sort of thing too idiosyncratic for there to be a general method?

Thanks for any pointers!

inspect.getcallargs() may be worth a look.
 
P

Peter Cacioppi

Hi folks,



I'm trying to make some of Python class definitions behave like the ones I find in professional packages, such as Matplotlib. A Matplotlib class can often have a very large number of arguments -- some of which may be optional, some of which will assume default values if the user does not overridethem, etc.



I have working code which does this kind of thing. I define required arguments and their default values as a class attribute, in an OrderedDict, sothat I can match up defaults, in order, with *args. I'm using set.issuperset() to see if an argument passed in **kwargs conflicts with one which waspassed in *args. I use set.isdisjoint() to look for arguments in **kwargs which are not expected by the class definition, raising an error if such arguments are found.



Even though my code works, I'm finding it to be a bit clunky. And now, I'm writing a new class which has subclasses, and so actually keeps the "extra" kwargs instead of raising an error... This is causing me to re-evaluatemy original code.



It also leads me to ask: is there a CLEAN and BROADLY-APPLICABLE way for handling the *args/**kwargs/default values shuffle that I can study? Or isthis sort of thing too idiosyncratic for there to be a general method?



Thanks for any pointers!

Elegance is a matter of taste, but here is one pattern.


I tend to think that a very long argument lists are either the result of poor design or an indication that readability would benefit from some sort ofdedicated, featherweight "parameter" or "builder" object. The builder object is mutable and copied by any functions that consume it.

To my mind, a nice pattern can be as follows.
--> Class A is a worker class
--> Class B is a worker-builder (or worker-parameter).
--> You build B first
-->--> usually by first calling a constructor with few to no arguments and then by setting specific properties of B.
--> You pass B to the constructor of A, which copies the data over to control the mutability of A.
--> A has a getter code that returns a copy of it's saved, private B data, so that you can "remember" later on how it was built.

The other point, perhaps more Pythonic, idea here is to exploit this language feature of Python 3 to force argument naming. This would be nice if typical usage involved many possible arguments but only a small number of passed arguments in the typical usage.

http://stackoverflow.com/questions/2965271/forced-naming-of-parameters-in-python
 
S

Steven D'Aprano

Hi folks,

I'm trying to make some of Python class definitions behave like the ones
I find in professional packages, such as Matplotlib. A Matplotlib class
can often have a very large number of arguments -- some of which may be
optional, some of which will assume default values if the user does not
override them, etc.

What makes Matplotlib so professional?

Assuming that "professional" packages necessarily do the right thing is
an unsafe assumption. Many packages have *lousy* interfaces. They often
get away with it because of the "sunk cost" fallacy -- if you've spent
$3000 on a licence for CrapLib, you're likely to stick through the pain
of learning its crap interface rather than admit you wasted $3000. Or the
package is twenty years old, and remains compatible with interfaces that
wouldn't be accepted now, but that's what the user community have learned
and they don't want to learn anything new. Or backwards compatibility
requires them to keep the old interface.

I have not used mathplotlib enough to judge its interface, but see below.

I have working code which does this kind of thing. I define required
arguments and their default values as a class attribute, in an
OrderedDict, so that I can match up defaults, in order, with *args. I'm
using set.issuperset() to see if an argument passed in **kwargs
conflicts with one which was passed in *args. I use set.isdisjoint()
to look for arguments in **kwargs which are not expected by the class
definition, raising an error if such arguments are found.

The cleanest way is:

class Spam:
def __init__(
self, arg, required_arg,
another_required_arg,
arg_with_default=None,
another_optional_arg=42,
and_a_third="this is the default",
):


and so on, for however many arguments your class wants. Then, when you
call it:

s = Spam(23, "something", another_optional_arg="oops, missed one")


Python will automatically:

match up defaults, in order, ...
see if an argument conflicts with one ...
look for arguments ... which are not expected...
raising an error if such arguments are found
[end quote]


Why re-invent the wheel? Python already checks all these things for you,
and probably much more efficiently than you do. What benefit are you
getting from manually managing the arguments?

When you have a big, complex set of arguments, you should have a single
point of truth, one class or function or method that knows what args are
expected and uses Python's argument-handling to handle them. Other
classes and functions which are thin (or even not-so-thin) wrappers
around that class shouldn't concern themselves with the details of what's
in *args and **kwargs, they should just pass them on untouched.


There are two main uses for *args:

1) Thin wrappers, where you just collect all the args and pass them on,
without caring what name they eventually get assigned to:

class MySubclass(MyClass):
def spam(self, *args):
print("calling MySubclass")
super(MySubclass, self).spam(*args)


2) Collecting arbitrary, homogeneous arguments for processing, where the
arguments don't get assigned to names, e.g.:

def mysort(*args):
return sorted(args)

mysort(2, 5, 4, 7, 1)
=> [1, 2, 4, 5, 7]


Using *args and then manually matching up each argument with a name just
duplicates what Python already does.


Even though my code works, I'm finding it to be a bit clunky. And now,
I'm writing a new class which has subclasses, and so actually keeps the
"extra" kwargs instead of raising an error... This is causing me to
re-evaluate my original code.

It also leads me to ask: is there a CLEAN and BROADLY-APPLICABLE way for
handling the *args/**kwargs/default values shuffle that I can study?

Yes. Don't do it :)

It is sometimes useful to collect extra keyword arguments, handle them in
the subclass, then throw them away before passing them on:

class MySubclass(MyClass):
def spam(self, *args, **kwargs):
reverse = kwargs.pop('reverse', False)
msg = "calling MySubclass"
if reverse:
msg = msg[::-1]
print(msg)
super(MySubclass, self).spam(*args, **kwargs)

kwargs is also handy for implementing keyword-only arguments in Python 2
(in Python 3 it isn't needed). But in that case, you don't have to worry
about matching up keyword args by position, since position is normally
irrelevant. Python's basic named argument handling should cover nearly
all the code you want to write, in my opinion.
 
S

Skip Montanaro

What makes Matplotlib so professional?
Assuming that "professional" packages necessarily do the right thing is
an unsafe assumption. Many packages have *lousy* interfaces.

Not that it's a complete explanation for matplotlib's interfaces, but
it did start out as a Python-based replacement for MATLAB. I seem to
recall that John Hunter started the project because the lab he worked
in as a postdoc only had a single MATLAB license, so it wasn't always
available when he needed it.

Skip
 
J

John Ladasky

Thanks, everyone, for your replies. Perhaps I have complicated things unnecessarily? I was just trying to do some error-checking on the arguments supplied to the class constructor. Perhaps Python already implements automatically what I am trying to accomplish manually? I'll tinker around with some minimal code, try to provoke some errors, and see what I get.

Here is one more detail which may be relevant. The base class for the family of classes I am developing is a numpy.ndarray. The numpy.ndarray is a Cextension type (and if I understand correctly, that means it is immutable by ordinary Python methods). Subclassing ndarray can get a bit complicated(see http://docs.scipy.org/doc/numpy/user/basics.subclassing.html). The ndarray.__init__ method is inaccessible, instead one overrides ndarray.__new__.

Making further subclasses of a subclassed numpy.ndarray, each of which may have their own arguments, is what I am trying to accomplish while adhering to the "DRY" principle.
 
J

John Ladasky

Here is one more detail which may be relevant. The base class for the family of classes I am developing is a numpy.ndarray. The numpy.ndarray is aC extension type (and if I understand correctly, that means it is immutable by ordinary Python methods). Subclassing ndarray can get a bit complicated (see http://docs.scipy.org/doc/numpy/user/basics.subclassing.html).

I've just been reading the above page, which is pretty new. It supersedes a now-defunct page, http://www.scipy.org/Subclasses. The unusual subclassing needs of an ndarray apparently arise, not from the fact that an ndarray is a C extension type, but because of numpy's special view casting and slicing requirements. While I don't believe this is highly relevant to the args/kwargs issues I have here, I thought that I should correct my earlier remark.
 
P

Peter Cacioppi

Hi folks,



I'm trying to make some of Python class definitions behave like the ones I find in professional packages, such as Matplotlib. A Matplotlib class can often have a very large number of arguments -- some of which may be optional, some of which will assume default values if the user does not overridethem, etc.



I have working code which does this kind of thing. I define required arguments and their default values as a class attribute, in an OrderedDict, sothat I can match up defaults, in order, with *args. I'm using set.issuperset() to see if an argument passed in **kwargs conflicts with one which waspassed in *args. I use set.isdisjoint() to look for arguments in **kwargs which are not expected by the class definition, raising an error if such arguments are found.



Even though my code works, I'm finding it to be a bit clunky. And now, I'm writing a new class which has subclasses, and so actually keeps the "extra" kwargs instead of raising an error... This is causing me to re-evaluatemy original code.



It also leads me to ask: is there a CLEAN and BROADLY-APPLICABLE way for handling the *args/**kwargs/default values shuffle that I can study? Or isthis sort of thing too idiosyncratic for there to be a general method?



Thanks for any pointers!

"Subclassing ndarray can get a bit complicated"

Another software pattern idea is "encapsulate don't inherit". When a class is really messy to subclass, start fresh with a new class that wraps the messy class. Create redirect methods for whatever is needed, then subclass from the class you created.

In fact, I'd go so far as to say you should only subclass from classes thatwere designed with subclassing in mind. If you find yourself bending over backwards to make subclassing work, it means you should be wrapping and redirecting instead.

This is perhaps more true in C#/Java than Python, but still something to think about.
 
S

Steven D'Aprano

Thanks, everyone, for your replies. Perhaps I have complicated things
unnecessarily? I was just trying to do some error-checking on the
arguments supplied to the class constructor. Perhaps Python already
implements automatically what I am trying to accomplish manually? I'll
tinker around with some minimal code, try to provoke some errors, and
see what I get.

It's really hard to make definitive judgements without actually seeing
your code and understanding your use-case. I can only suggest that, you
*may* be complicating things unnecessarily. On the other hand, there's
always the chance that your requirements are sufficiently unusual that
you have done exactly what needs to be done.

But I suspect even in this case, there may be a more elegant way to solve
the problem of "I'm finding it to be a bit clunky", to quote your
original post. Clunky code can sometimes be smoothed out by refactoring
the complexity by use of decorators. Can you post an example of your code?

One thought -- often, people turn to subclassing as the only tool in
their toolbox. Have you considered that it may be easier/better to work
with delegation and composition instead?

Here is one more detail which may be relevant. The base class for the
family of classes I am developing is a numpy.ndarray. The numpy.ndarray
is a C extension type (and if I understand correctly, that means it is
immutable by ordinary Python methods). Subclassing ndarray can get a
bit complicated (see
http://docs.scipy.org/doc/numpy/user/basics.subclassing.html). The
ndarray.__init__ method is inaccessible, instead one overrides
ndarray.__new__.

Don't forget ndarray.__array_finalize__, __array_wrap__ and
__array_prepare__.

I am not an expert on numpy, but reading that page just makes me think
they're doing it all wrong, adding far too much complication. (I've
written code like that myself, but thank goodness I've had the sense to
throw it away and start again...). I'm trying to give them the benefit of
the doubt, but I've never liked the amount of DWIM cleverness in numpy,
and I think they would have been *much* better off having a clean
separation between the three ways of creating an array:

- the normal Python __new__ and __init__ mechanism

- creating a view into an array

- templating

instead of conflating the three into a single mechanism. I suspect that
the fundamental confusion comes about because numpy doesn't have a clean
distinction between views into an array, and actual arrays. Although I
must admit I've not done more than dip my toe into numpy, so you should
take my criticisms with a generous pinch of salt.

Making further subclasses of a subclassed numpy.ndarray, each of which
may have their own arguments, is what I am trying to accomplish while
adhering to the "DRY" principle.

The usual way of doing this is to accept only keyword arguments for any
additional args:


class Base:
def __new__(cls, doc, grumpy, happy, sleepy, bashful, sneezy, dopey):
...

class Subclass(Base):
def __new__(cls, *args, **kwargs):
# grab the additional arguments
sneaky = kwargs.pop('sneaky', True) # optional
grabby = kwargs.pop('grabby') # mandatory
touchy = kwargs.pop('touchy')
feely = kwargs.pop('feely')
instance = super(Subclass, cls).__new__(cls, *args, **kwargs)
# process additional arguments
instance.apply_extras(sneaky, grabby, touchy, feely)
return instance


# In Python 3, I can do this to make it even cleaner:
class Subclass(Base):
def __new__(cls, *args, sneaky=True, grabby, touchy, feely, **kwargs):
instance = super(Subclass, cls).__new__(cls, *args, **kwargs)
# process additional arguments
instance.apply_extras(sneaky, grabby, touchy, feely)
return instance



In general, you should aim to use either __new__ or __init__ but not
both, although that's not a hard law, just a guideline.

Can you adapt this pattern to ndarray?
 
J

John Ladasky

Wow, Steven, that was a great, detailed reply. I hope you will forgive me for shortcutting to the end, because I've been hacking away for a few hoursand came to this very conclusion:

In general, you should aim to use either __new__ or __init__ but not
both, although that's not a hard law, just a guideline.

My problems were solved by adhering to using only __new__ in my ndarray subclasses, and avoiding __init__. (If I used both methods, my arguments werepassed to the object twice, once through each method. That's weird! It messed me up! And I'm not sure what purpose it serves.) The __new__ methods of my subclasses now call super().__new__ to handle the attributes and error checking which are common to all the classes, then handle the subclass-specific variables.

One wrinkle that I had to comprehend was that super().__new__ would be returning me a half-baked object on which I had to do more work. I'm used to __init__, of course, which works on self.

OK, as for some other points:
Don't forget ndarray.__array_finalize__, __array_wrap__ and
__array_prepare__.

I handle __array_finalize__ in my base class. Also __reduce_ex__ and __setstate__, so that I can pickle and unpickle my array objects (which is necessary for multiprocessing work). I haven't had time to deal with __array_wrap__ or __array_prepare__ yet, but so far my downstream code is working without these methods (crossing fingers).
I am not an expert on numpy, but reading that page just makes me think
they're doing it all wrong, adding far too much complication. (I've
written code like that myself, but thank goodness I've had the sense to
throw it away and start again...). I'm trying to give them the benefit of
the doubt, but I've never liked the amount of DWIM cleverness in numpy,
and I think they would have been *much* better off having a clean
separation between the three ways of creating an array:

- the normal Python __new__ and __init__ mechanism
- creating a view into an array
- templating

instead of conflating the three into a single mechanism.

I agree, I always find it complicated to wrap my head around these complexities. But I simply can't live without numpy!

And finally:
sneaky = kwargs.pop('sneaky', True) # optional

I don't know whether to be excited or embarrassed that I can still learn things about the basics of Python... I've never used the optional argument ofdict.pop(). Cool! Thanks.
 
P

Peter Cacioppi

Hi folks,



I'm trying to make some of Python class definitions behave like the ones I find in professional packages, such as Matplotlib. A Matplotlib class can often have a very large number of arguments -- some of which may be optional, some of which will assume default values if the user does not overridethem, etc.



I have working code which does this kind of thing. I define required arguments and their default values as a class attribute, in an OrderedDict, sothat I can match up defaults, in order, with *args. I'm using set.issuperset() to see if an argument passed in **kwargs conflicts with one which waspassed in *args. I use set.isdisjoint() to look for arguments in **kwargs which are not expected by the class definition, raising an error if such arguments are found.



Even though my code works, I'm finding it to be a bit clunky. And now, I'm writing a new class which has subclasses, and so actually keeps the "extra" kwargs instead of raising an error... This is causing me to re-evaluatemy original code.



It also leads me to ask: is there a CLEAN and BROADLY-APPLICABLE way for handling the *args/**kwargs/default values shuffle that I can study? Or isthis sort of thing too idiosyncratic for there to be a general method?



Thanks for any pointers!

"One thought -- often, people turn to subclassing as the only tool in
their toolbox. Have you considered that it may be easier/better to work
with delegation and composition instead? "

Double like.

Subclassing is awesome when it is used properly ... which usually means used cautiously.

Delegation/composition just doesn't result in the some sort of weird gotchas.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,186
Members
46,743
Latest member
WoodrowMea

Latest Threads

Top