class with __len__ member fools boolean usage "if x:" ; bad coding style?

G

george young

[Python 2.3.3, x86 linux]
I had developed the habit of using the neat python form:
if someinstance:
someinstance.memb()

because it seems cleaner than "if someinstance is not None".
{please no flames about "is not None" vs. "!= None" ...}

This seemed like a good idea at the time :(). Twice, recently,
however, as my
app grew, I thought, hmm... it would make things clearer if I gave
this container class a __len__ member and maybe a __getitem__. Great,
code looks
nicer now... crash,crash, many expletives deleted...

Its great to be able to say containerinstance[seq] instead of
containerinstance.steps[seq], but I've also changed the semantics of
(containerinstance) in a boolean context. My app breaks only in the
seldom
case that the container is empty.

Obviously I know how to fix the code, but I'm wondering if this isn't
a message
that "if containerinstance:" is not a good coding practice. Or maybe
that one
should never *add* on sequence emulation to a class that's already in
use.
It may look like adding a __len__ and __getitem__ is just *extending*
the
functionality of the class, but in fact, it strongly *changes*
semantics of
existing usage of the class.

I know the ref manual mentions this behaviour,
but I begin to wonder at the wisdom of a language design and common
usage pattern of (containerinstance) instead of
(len(containerinstance)) and (containerinstance is None) as a boolean
expression.
Comments? Suggestions?
 
?

=?iso-8859-1?Q?Fran=E7ois?= Pinard

[george young]
I had developed the habit of using the neat python form:
if someinstance:
someinstance.memb()
Its great to be able to say containerinstance[seq] instead of
containerinstance.steps[seq], but I've also changed the semantics of
(containerinstance) in a boolean context. My app breaks only in the
seldom case that the container is empty.

Sequences, when empty, are False in boolean contexts. So, the empty
string, the empty list, the empty tuple are all False.

If you do not specify anything "special" :)-), your own objects are
always True. If you specify `__len__', Python will consider your object
in the spirit of a sequence, where a zero-length means False. If you do
not want that Python do see your object in the spirit of a sequence in
boolean contexts, you might have to add a method:

def __nonzero__(self):
return True

to tell that your own objects are always True. (You might of course use
`__nonzero__' to implement more subtle concepts of Truth and Falsity.)
Comments? Suggestions?

There is nothing wrong in writing `if x:', as long as you decide
yourself what it should mean. But you have to let Python know what
your decision was. If you do not say anything, Python has its ways for
guessing, which are chosen so to be useful on the average case, and also
well documented.
 
J

John Roth

george young said:
[Python 2.3.3, x86 linux]
I had developed the habit of using the neat python form:
if someinstance:
someinstance.memb()

because it seems cleaner than "if someinstance is not None".
{please no flames about "is not None" vs. "!= None" ...}

This seemed like a good idea at the time :(). Twice, recently,
however, as my
app grew, I thought, hmm... it would make things clearer if I gave
this container class a __len__ member and maybe a __getitem__. Great,
code looks
nicer now... crash,crash, many expletives deleted...

Its great to be able to say containerinstance[seq] instead of
containerinstance.steps[seq], but I've also changed the semantics of
(containerinstance) in a boolean context. My app breaks only in the
seldom case that the container is empty.

Obviously I know how to fix the code, but I'm wondering if this isn't
a message
that "if containerinstance:" is not a good coding practice.

Almost. The message is that testing for None, however
you're doing it, is a Code Smell in the sense defined in
the Refactoring book. If some attribute is supposed to have
a Foo object, then it should have a Foo or a subclass of
Foo, not None.

Sometimes there's no way around it, but whenever you find
yourself testing for None, consider using a Null Object instead.
A Null Object is a subclass of the normal object you would
be expecting, but one that has methods and attributes that
handle the exceptional case cleanly.

Of course, there are a couple of very pretty idioms for
handling optional parameters that depend on tests for None,
but they're definitely special cases, and they also break if the
real parameter can be False.

John Roth
 
G

george young

John Roth said:
george young said:
[Python 2.3.3, x86 linux]
I had developed the habit of using the neat python form:
if someinstance:
someinstance.memb()

because it seems cleaner than "if someinstance is not None".
{please no flames about "is not None" vs. "!= None" ...}

This seemed like a good idea at the time :(). Twice, recently,
however, as my
app grew, I thought, hmm... it would make things clearer if I gave
this container class a __len__ member and maybe a __getitem__. Great,
code looks
nicer now... crash,crash, many expletives deleted...

Its great to be able to say containerinstance[seq] instead of
containerinstance.steps[seq], but I've also changed the semantics of
(containerinstance) in a boolean context. My app breaks only in the
seldom case that the container is empty.

Obviously I know how to fix the code, but I'm wondering if this isn't
a message
that "if containerinstance:" is not a good coding practice.

Almost. The message is that testing for None, however
you're doing it, is a Code Smell in the sense defined in
the Refactoring book. If some attribute is supposed to have
a Foo object, then it should have a Foo or a subclass of
Foo, not None.

Sometimes there's no way around it, but whenever you find
yourself testing for None, consider using a Null Object instead.
A Null Object is a subclass of the normal object you would
be expecting, but one that has methods and attributes that
handle the exceptional case cleanly.

Of course, there are a couple of very pretty idioms for
handling optional parameters that depend on tests for None,
but they're definitely special cases, and they also break if the
real parameter can be False.

Null Object seems like a perfect fit for this. I was unaware of it.
I read the original GOF book, but not much since then on patterns.
Thnks very much!

-- George
 
H

Heiko Wundram

Am Sonntag, 27. Juni 2004 00:44 schrieb george young:
Null Object seems like a perfect fit for this. I was unaware of it.
I read the original GOF book, but not much since then on patterns.
Thnks very much!

What I actually think you want is to have a look at __nonzero__... This
special member is called first time round, before trying to call __len__. Say
you have the following class:

class xyz:

def __nonzero__(self):
return True

def __len__(self):
return 0

then

if xyz():
print "Yes!"

will print yes, although len(xyz()) == 0.

HTH!

Heiko.
 
P

Peter Otten

Heiko said:
class xyz:

def __nonzero__(self):
return True

def __len__(self):
return 0

then

if xyz():
print "Yes!"

will print yes, although len(xyz()) == 0.

Now you have an object that neither behaves consistently as a boolean nor as
a sequence, I fear you in for even subtler bugs...

Peter
 
H

Heiko Wundram

Am Dienstag, 29. Juni 2004 07:59 schrieb Peter Otten:
Now you have an object that neither behaves consistently as a boolean nor
as a sequence, I fear you in for even subtler bugs...

That isn't necessarily true... Given the following example, I'd say what
__nonzero__ and __len__ implement is quite understandable, and if documented,
the programmer isn't in for any bug:

<code>

import time

class ID(object):

def __init__(self):
self.__id = "test"

def __len__(self):
return len(self.__id)

class Host(ID):

def __init__(self):
self.__timeout = time.time() + 30

def __nonzero__(self):
return self.__timeout >= time.time()

</code>

nonzero and len implement something completely different, where __len__ is an
operator on the underlying ID of a Host, and __nonzero__ is an operator on
the Host itself, to check whether the Host has timed out.

It doesn't make sense to have __nonzero__ refer to the ID (which is always
nonzero, a string), and it neither makes sense to have __len__ refer to the
Host (which doesn't have a length), so the situation here is pretty clear
(IMHO).

But, as always, documentation is better than guessing. ;)

Heiko.
 
P

Peter Otten

Heiko said:
Am Dienstag, 29. Juni 2004 07:59 schrieb Peter Otten:

That isn't necessarily true... Given the following example, I'd say what
__nonzero__ and __len__ implement is quite understandable, and if
documented, the programmer isn't in for any bug:

<code>

import time

class ID(object):

def __init__(self):
self.__id = "test"

def __len__(self):
return len(self.__id)

class Host(ID):

def __init__(self):
self.__timeout = time.time() + 30

def __nonzero__(self):
return self.__timeout >= time.time()

</code>

nonzero and len implement something completely different, where __len__ is
an operator on the underlying ID of a Host, and __nonzero__ is an operator
on the Host itself, to check whether the Host has timed out.

In Python, you don't normally check for a timeout (google for LBYL), you'd
rather throw an exception. This avoids problems like

h = Host()
if h:
sleep(justLongEnoughForHostToTimeOut)
h.someOperation() # fails due to timeout

It doesn't make sense to have __nonzero__ refer to the ID (which is always
nonzero, a string), and it neither makes sense to have __len__ refer to
the Host (which doesn't have a length), so the situation here is pretty
clear (IMHO).

A __len__() without a __getitem__() method doesn't make sense to me. But
maybe your example is just too terse...
But, as always, documentation is better than guessing. ;)

No amount of documentation can heal an unintuitive API.
The convention of using bool(o) as an abbreviation of o.isValid() for
non-sequences and of len(o) != 0 for sequences seems natural to me. Mixing
these two meanings or even adding "was this parameter provided" as a third
one will result in highly ambiguous code that is bound to break.

Peter
 
D

Donn Cave

Heiko Wundram wrote: ....

No amount of documentation can heal an unintuitive API.
The convention of using bool(o) as an abbreviation of o.isValid() for
non-sequences and of len(o) != 0 for sequences seems natural to me. Mixing
these two meanings or even adding "was this parameter provided" as a third
one will result in highly ambiguous code that is bound to break.

I agree, but I think part of the problem is trying to milk too
much from all of these features.

The time to implement __nonzero__, __getitem__ etc. is when
there's a clear need for them, like an application context
where this polymorphism is needed to make things work. If you
do it because you think it looks nicer, don't complain when
it breaks things because you brought it on yourself with this
fuzzy thinking. (Using "you" in the generic sense.)

Donn Cave, (e-mail address removed)
 
H

Heiko Wundram

Am Dienstag, 29. Juni 2004 20:34 schrieb Peter Otten:
In Python, you don't normally check for a timeout (google for LBYL), you'd
rather throw an exception. This avoids problems like

h = Host()
if h:
sleep(justLongEnoughForHostToTimeOut)
h.someOperation() # fails due to timeout

The operations on the host don't fail if the timeout has expired, in my use
case. It's just that a host has a timeout, which signals the surrounding
code, that this host needs to be contacted in the next run of ping signals.

What I can do to get these hosts now looks like the following:

ping_hosts = [h for h in hosts if not h]

That's what I call nice and concise, and at least for me the meaning is clear
by just looking at the code. If the host has expired (not host), add it to
the list of hosts to ping now.
A __len__() without a __getitem__() method doesn't make sense to me. But
maybe your example is just too terse...

Why should you need a __getitem__() if you have __len__() defined? In my use
case, the ID (of a host/data-item, etc.) is not retrievable by single
character (that would make no sense), but the length of the ID is
significant, as it can signal important information as on the protocol to use
to contact the host, etc.

So, I can how write:

someid = host
myid_old = myhost.old_id
myid_new = myhost.new_id

if len(someid) == 26:
dist = myid_old ^ someid
elif len(someid) == 30:
dist = myid_new ^ someid
else:
raise ValueError, "Invalid host."

(where __xor__ is again a method defined on two IDs, returning the numerical
distance, the binary xor between the two numbers)

I would never call this unintuitive, as in effect hosts are just IDs which
have additional data (namely the IP/port pair), and can be used in any
context in the program where IDs are wanted. And IDs can be asked for their
length (to decide what to do with it). This doesn't just mean Hosts, also
Data Items are IDs with additional data, which can be used just as well here.
No amount of documentation can heal an unintuitive API.
The convention of using bool(o) as an abbreviation of o.isValid() for
non-sequences and of len(o) != 0 for sequences seems natural to me. Mixing
these two meanings or even adding "was this parameter provided" as a third
one will result in highly ambiguous code that is bound to break.

I can understand that in the normal case you would want to code something
either as a sequence, or as a non-sequence. But, in this case, you have two
classes, which have different use cases, but are derived from another (and
deriving a Host from an ID isn't strange, at least for me). And for code
which does something with the ID (and is specifically marked as such), it's
pretty fair to use the ID part of the class which is passed in (which in this
case are __len__ and __xor__) to make it concise, while where you use hosts,
you take the host protocol to get at the host data (__nonzero__).

I don't see anything unintuitive in this... Actually, it makes the code look
cleaner, IMHO.

Heiko.
 
D

Donn Cave

Heiko Wundram said:
The operations on the host don't fail if the timeout has expired, in my use
case. It's just that a host has a timeout, which signals the surrounding
code, that this host needs to be contacted in the next run of ping signals.

What I can do to get these hosts now looks like the following:

ping_hosts = [h for h in hosts if not h]

That's what I call nice and concise, and at least for me the meaning is clear
by just looking at the code. If the host has expired (not host), add it to
the list of hosts to ping now. ....
I can understand that in the normal case you would want to code something
either as a sequence, or as a non-sequence. But, in this case, you have two
classes, which have different use cases, but are derived from another (and
deriving a Host from an ID isn't strange, at least for me). And for code
which does something with the ID (and is specifically marked as such), it's
pretty fair to use the ID part of the class which is passed in (which in this
case are __len__ and __xor__) to make it concise, while where you use hosts,
you take the host protocol to get at the host data (__nonzero__).

I don't see anything unintuitive in this... Actually, it makes the code look
cleaner, IMHO.

It's the __nonzero__ part that hurts. Insofar as you have
explained it, an expired() method would be much clearer
than __nonzero__(), and it would make more sense. e.g.,

ping_hosts = [h for h in hosts if h.expired()]

This approach lets you have more than one boolean attribute,
because they can each have their own name.

Donn Cave, (e-mail address removed)
 
J

Jeff Epler

Am Dienstag, 29. Juni 2004 20:34 schrieb Peter Otten:
Why should you need a __getitem__() if you have __len__() defined? In my use
case, the ID (of a host/data-item, etc.) is not retrievable by single
character (that would make no sense), but the length of the ID is
significant, as it can signal important information as on the protocol to use
to contact the host, etc.

I agree with Herr Otten. __len__ is intended to be implemented by
container objects. http://docs.python.org/ref/sequence-types.html
If your object doesn't permit item access (o or o[k]) then I think
people will be surprised to find that len(o) does not cause a TypeError.

Of course, you're permitted to write any program you like, and Python
will even execute some of them and give results that please you. You're
not required to write only programs that don't surprise anybody.

Jeff

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA4ioUJd01MZaTXX0RAs8vAJ9euelDDplx2ESwJAKfq7PMVJYDNgCbBg/W
iKF9ncw6jUULzRqt4luiyJc=
=5WY2
-----END PGP SIGNATURE-----
 
G

Greg Ewing

Heiko said:
ping_hosts = [h for h in hosts if not h]

That's what I call nice and concise, and at least for me the meaning is clear
by just looking at the code.

I suspect it's only clear to you because you wrote it.
I would find that piece of code *extremely* confusing --
it looks like it's going to return a list full of Nones!

If, on the other hand, it were written something like

ping_hosts = [h for h in hosts if h.expired()]

the meaning would be crystal clear to everyone, I think.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,202
Messages
2,571,057
Members
47,662
Latest member
sxarexu

Latest Threads

Top