tp_base, ob_type, and tp_bases

J

Jeff McNeil

Hi all,

In an effort to get (much) better at writing Python code, I've been
trying to follow and document what the interpreter does from main in
Modules/python.c up through the execution of byte code. Mostly for my
own consumption and benefit, but I may blog it if it turns out half
way decent.

Anyways, I've been going through PyType_Ready as it sets up copies of
PyObjectType, and I'm a bit confused by tp_base, ob_type, and
tp_bases. I've been using http://docs.python.org/c-api/index.html as
a guideline.

So, the documentation states that ob_type is a pointer to the type's
type, or metatype. Rather, this is a pointer to the new type's
metaclass?

Next, we have tp_base. That's defined as "an optional pointer to a
base type from which type properties are inherited." The value of
tp_base is then added to the tp_bases tuple. This is confusing me. On
the surface, it sound as though they're one in the same?

I *think* (and dir() of a subclass of type seems to back it up), that
tp_base is only at play when the object in question inherits from
type?

I've tried Googling around a bit. Perhaps it's better documented
somewhere? Or perhaps someone can give me a quick explanation?

Thanks!

Jeff
 
M

Martin v. Löwis

So, the documentation states that ob_type is a pointer to the type's
type, or metatype. Rather, this is a pointer to the new type's
metaclass?

That's actually the same. *Every* ob_type field points to the object's
type, e.g. for strings, integers, tuples, etc. That includes type
objects, where ob_type points to the type's type, i.e. it's meta-type,
also called metaclass (as "class" and "type" are really synonyms).
Next, we have tp_base. That's defined as "an optional pointer to a
base type from which type properties are inherited." The value of
tp_base is then added to the tp_bases tuple. This is confusing me. On
the surface, it sound as though they're one in the same?

(I don't understand the English "one in the same" - interpreting it
as "as though they should be the same")

No: tp_bases is a tuple of all base types (remember, there is multiple
inheritance); tp_base (if set) provides the first base type.
I *think* (and dir() of a subclass of type seems to back it up), that
tp_base is only at play when the object in question inherits from
type?

No - it is the normal case for single inheritance. You can leave it
NULL, which means you inherit from object.

Regards,
Martin
 
J

Jeff McNeil

That's actually the same. *Every* ob_type field points to the object's
type, e.g. for strings, integers, tuples, etc. That includes type
objects, where ob_type points to the type's type, i.e. it's meta-type,
also called metaclass (as "class" and "type" are really synonyms).


(I don't understand the English "one in the same" - interpreting it
as "as though they should be the same")

No: tp_bases is a tuple of all base types (remember, there is multiple
inheritance); tp_base (if set) provides the first base type.


No - it is the normal case for single inheritance. You can leave it
NULL, which means you inherit from object.

Regards,
Martin

Thank you! It was tp_base that was confusing me. The tp_bases member
makes sense as Python supports multiple inheritance. It wasn't
immediately clear that tp_base is there for single inheritance
reasons. It's all quite clear now.

Is that an optimization of sorts?

Jeff
 
J

Jeff McNeil

Thank you! It was tp_base that was confusing me.  The tp_bases member
makes sense as Python supports multiple inheritance.  It wasn't
immediately clear that tp_base is there for single inheritance
reasons. It's all quite clear now.

Is that an optimization of sorts?

Jeff

Well, maybe not specifically for single inheritance reasons, I just
didn't see an immediate reason to keep a separate pointer to the first
base type.
 
T

Terry Reedy

Martin said:
(I don't understand the English "one in the same" - interpreting it
as "as though they should be the same")

Martin, you are not alone! I do not really understand that either.
 
N

Ned Deily

M

Mark Wooding

Jeff McNeil said:
Thank you! It was tp_base that was confusing me. The tp_bases member
makes sense as Python supports multiple inheritance. It wasn't
immediately clear that tp_base is there for single inheritance
reasons. It's all quite clear now.

Is that an optimization of sorts?

I think it's a compatibility thing because types didn't acquire multiple
inheritance until the new-style class stuff started happening in,
ummm..., Python 2.2, I think.

This is rampant speculation. Please correct me.

-- [mdw]
 
C

Carl Banks

Well, maybe not specifically for single inheritance reasons, I just
didn't see an immediate reason to keep a separate pointer to the first
base type.


The reason you need a separate tp_base is because it doesn't
necessarily point to the first base type; rather, it points to the
first base type that has added any fields or slots to its internal
layout (in other words, the first type with a tp_basicsize > 8, on 32-
bit versions). I believe this is mainly for the benefit of Python
subclasses that define their own slots. The type constructor will
begin adding slots at an offset of tp_base->tp_basicsize.

To see an example, int objects have a tp_basicsize of 12 (there are 4
extra bytes for the interger). So if you multiply-inherit from int
and a Python class, int will always be tp_base.

class A(object): pass

class B(int,A): pass
print B.__base__ # will print <type 'int'>

class C(A,int): pass
print C.__base__ # will print <type 'int'>


A related issue is that you can't multiply inherit from two types that
have tp_basicsize > 8 unless one of them inherits from the other.
There can be only one tp_base. For instance:

class D(int,tuple): pass # will raise TypeError

class E(object):
__slots__ = ['a','b']

class F(object):
__slots__ = ['c','d']

class G(E,G): pass # will raise TypeError

class H(E,int): pass # will raise TypeError

.....

Here's a bit more background (and by "a bit" I mean "a lot"):

In 32-bit Python, objects of types defined in Python are usually only
16 bytes long. The layout looks like this.

instance dict
weak reference list
reference count
ob_type

The reference count, which is always the thing that the actual
PyObject* points at, isn't actually the first item in the object's
layout. The dict and weakref list are stored at a lower address.
(There's a reason for it.)

If a Python class defines any __slots__, the type constructor will add
the slots to the object's layout.

instance dict (if there is one)
weak reference list (if there is one)
reference count
ob_type
slot
slot
slot

Note that, because you defined __slots__, the object might not have an
instance dict or weak reference list associated with it. It might,
though, if one of the base classes had defined an instance dict or
weakref list. Or, it might not, but then a subclass might have its
onw instance dict or weakref list. That's why these guys are placed
at a lower address: so that they don't interfere with the layout of
subclasses. A subclass can add either more slots or a dict.

Object of types defined in C can have arbitrary layout. For instance,
it could have a layout that looks like this:

reference count
ob_type
PyObject* a
PyObject* b
long c
float d
instance dict

A problem with Python slots and C fields is that, if you want to
inherit from a type with a non-trivial object layout, aside from the
dict and weakrefs, all of the subtypes have to maintain the same
layout (see Liskov substitutability for rationale).

Subtypes can add their own fields or slots if they want, though. So,
if a Python subtype wants to define its own slots on top of a type
with a non-trivial object layout, it has to know which base has the
largest layout so that it doesn't use the same memory for its own
slots. Hence tp_base.


Carl Banks
 
J

Jeff McNeil

Well, maybe not specifically for single inheritance reasons, I just
didn't see an immediate reason to keep a separate pointer to the first
base type.

The reason you need a separate tp_base is because it doesn't
necessarily point to the first base type; rather, it points to the
first base type that has added any fields or slots to its internal
layout (in other words, the first type with a tp_basicsize > 8, on 32-
bit versions).  I believe this is mainly for the benefit of Python
subclasses that define their own slots.  The type constructor will
begin adding slots at an offset of tp_base->tp_basicsize.

To see an example, int objects have a tp_basicsize of 12 (there are 4
extra bytes for the interger).  So if you multiply-inherit from int
and a Python class, int will always be tp_base.

class A(object): pass

class B(int,A): pass
print B.__base__ # will print <type 'int'>

class C(A,int): pass
print C.__base__ # will print <type 'int'>

A related issue is that you can't multiply inherit from two types that
have tp_basicsize > 8 unless one of them inherits from the other.
There can be only one tp_base.  For instance:

class D(int,tuple): pass # will raise TypeError

class E(object):
    __slots__ = ['a','b']

class F(object):
    __slots__ = ['c','d']

class G(E,G): pass # will raise TypeError

class H(E,int): pass  # will raise TypeError

....

Here's a bit more background (and by "a bit" I mean "a lot"):

In 32-bit Python, objects of types defined in Python are usually only
16 bytes long.  The layout looks like this.

instance dict
weak reference list
reference count
ob_type

The reference count, which is always the thing that the actual
PyObject* points at, isn't actually the first item in the object's
layout.  The dict and weakref list are stored at a lower address.
(There's a reason for it.)

If a Python class defines any __slots__, the type constructor will add
the slots to the object's layout.

instance dict (if there is one)
weak reference list (if there is one)
reference count
ob_type
slot
slot
slot

Note that, because you defined __slots__, the object might not have an
instance dict or weak reference list associated with it.  It might,
though, if one of the base classes had defined an instance dict or
weakref list.  Or, it might not, but then a subclass might have its
onw instance dict or weakref list.  That's why these guys are placed
at a lower address: so that they don't interfere with the layout of
subclasses.  A subclass can add either more slots or a dict.

Object of types defined in C can have arbitrary layout.  For instance,
it could have a layout that looks like this:

reference count
ob_type
PyObject* a
PyObject* b
long c
float d
instance dict

A problem with Python slots and C fields is that, if you want to
inherit from a type with a non-trivial object layout, aside from the
dict and weakrefs, all of the subtypes have to maintain the same
layout (see Liskov substitutability for rationale).

Subtypes can add their own fields or slots if they want, though.  So,
if a Python subtype wants to define its own slots on top of a type
with a non-trivial object layout, it has to know which base has the
largest layout so that it doesn't use the same memory for its own
slots.  Hence tp_base.

Carl Banks


Thank you, that was very well put.

I spent some more time tracking everything down today as well. There
are some useful comments in object.h, and coupled with what you've
said, it all make sense now. CPython really does have a very clean
code base. I've written a lot of Python and a few C extensions, but
I've never dived deep enough to understand how everything really
clicks. I'm one of those folks that's never happy with what the API
documentation says; I've got to understand all of the nooks and
crannies before I'm happy!

Jeff
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,299
Messages
2,571,545
Members
48,299
Latest member
Ruby87897

Latest Threads

Top