sometype.__new__ and C subclasses

J

James Porter

I've been trying to write a Python C extension module that uses NumPy
and has a subtype of numpy.ndarray written in C. However, I've run into
a snag: calling numpy.ndarray.__new__(mysubtype, ...) triggers an
exception in the bowels of Python (this is necessary for a handful of
NumPy features). I'm posting to this list to try to figure out why this
exception exists in the first place, and what (if anything) I can do to
work around it.

The exception in question happens in Objects/typeobject.c in
tp_new_wrapper. Here's the comment for the block:

/* Check that the use doesn't do something silly and unsafe like
object.__new__(dict). To do this, we check that the
most derived base that's not a heap type is this type. */

The code has the end effect that basetype.__new__(subtype, ...) fails
whenever subtype is a statically-defined type (i.e. a normal C extension
type object). Why is this necessary in general? I can see why it might
be bad for a limited number of core Python types, but it seems
unnecessarily limiting for Python C extensions.

On a more practical note, is there a way (short of rewriting the subtype
in Python) to work around this? It seems that I could call the type
metaclass to create a heap type in C, but I'm not sure of all the
implications of that.

Thanks in advance,
Jim
 
C

Carl Banks

I've been trying to write a Python C extension module that uses NumPy
and has a subtype of numpy.ndarray written in C. However, I've run into
a snag: calling numpy.ndarray.__new__(mysubtype, ...) triggers an
exception in the bowels of Python (this is necessary for a handful of
NumPy features). I'm posting to this list to try to figure out why this
exception exists in the first place, and what (if anything) I can do to
work around it.

The exception in question happens in Objects/typeobject.c in
tp_new_wrapper. Here's the comment for the block:

        /* Check that the use doesn't do something silly and unsafe like
           object.__new__(dict).  To do this, we check that the
           most derived base that's not a heap type is this type. */

The code has the end effect that basetype.__new__(subtype, ...) fails
whenever subtype is a statically-defined type (i.e. a normal C extension
type object). Why is this necessary in general? I can see why it might
be bad for a limited number of core Python types, but it seems
unnecessarily limiting for Python C extensions.

Why don't you use mysubtype.__new__(mysubtype,...)?

If you wrote mysubtype in C, and defined a different tp_new than
ndarray, then this exception will trigger. And it ought to; you don't
want to use ndarray's tp_new to create an object of your subclass, if
you've defined a different tp_new.

On a more practical note, is there a way (short of rewriting the subtype
in Python) to work around this? It seems that I could call the type
metaclass to create a heap type in C, but I'm not sure of all the
implications of that.

It should work if you use mysubtype.__new__(mysubtype,...).

If it doesn't do what you want, then there's probably something wrong
with the way you subclassed ndarray.


Carl Banks
 
J

James Porter

Why don't you use mysubtype.__new__(mysubtype,...)?

If you wrote mysubtype in C, and defined a different tp_new than
ndarray, then this exception will trigger. And it ought to; you don't
want to use ndarray's tp_new to create an object of your subclass, if
you've defined a different tp_new.

Unfortunately, I can't do that, since that call is in NumPy itself and
it's part of their "standard" way of making instances of subclasses of
ndarray. Functions like numpy.zeros_like use ndarray.__new__(subtype,
....) to create new arrays based on the shape of other arrays.

The Python version of the subclass is shown here:
<http://docs.scipy.org/doc/numpy/use...tic-example-attribute-added-to-existing-array>,
and I'm trying to write something pretty similar in C. I'm trying to
stay in C since everything else is in C, so it's easier to stay in C
then to jump back and forth all the time.

Maybe the real answer to this question is "NumPy is doing it wrong" and
I should be on their list; still, it seems strange that the behavior is
different between Python and C.

- Jim
 
R

Robert Kern

Unfortunately, I can't do that, since that call is in NumPy itself and
it's part of their "standard" way of making instances of subclasses of
ndarray. Functions like numpy.zeros_like use ndarray.__new__(subtype,
...) to create new arrays based on the shape of other arrays.

The Python version of the subclass is shown here:
<http://docs.scipy.org/doc/numpy/use...tic-example-attribute-added-to-existing-array>,
and I'm trying to write something pretty similar in C. I'm trying to
stay in C since everything else is in C, so it's easier to stay in C
then to jump back and forth all the time.

Maybe the real answer to this question is "NumPy is doing it wrong" and
I should be on their list; still, it seems strange that the behavior is
different between Python and C.

Perhaps things would be clearer if you could post the C code that you've written
that fails. So far, you've only alluded at what you are doing using
Python-syntax examples.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
J

James Porter

Perhaps things would be clearer if you could post the C code that you've
written that fails. So far, you've only alluded at what you are doing
using Python-syntax examples.

I'm not sure how much this will help, but here you go. The actual C code
probably doesn't matter except for where I set tp_flags, tp_new, and
register the type, but I included it for completeness. The full C source
is available here if you need it, but be warned that other strangeness
abounds in the code:
<http://trac.mcs.anl.gov/projects/ITAPS/browser/python/trunk/iMesh_array.inl?rev=3831>.

Obviously, this is kind of a bizarre case, so I'm not entirely sure what
the best route is here.

Thanks,
Jim

static PyObject*
iMeshArrObj_new(PyTypeObject *cls,PyObject *args,PyObject *kw)
{
static char *kwlist[] = {"object","instance",0};

PyObject *obj;
iMesh_Object *instance = NULL;
PyObject *arr = NULL;
iMeshArr_Object *self;

if(!PyArg_ParseTupleAndKeywords(args,kw,"O|O!",kwlist,&obj,
&iMesh_Type,&instance))
return NULL;

arr = PyArray_FROM_O(obj);
if(arr == NULL)
return NULL;

self = (iMeshArr_Object*)PyObject_CallMethod(arr,"view","O",cls);
Py_DECREF(arr);
if(self == NULL)
return NULL;

/* some boring stuff to set |instance| */

return self;
}

static void
iMeshArrObj_dealloc(iMeshArr_Object *self)
{
Py_XDECREF(self->instance);
self->array.ob_type->tp_free((PyObject*)self);
}

static PyObject*
iMeshArrObj_finalize(iMeshArr_Object *self,PyObject *args)
{
iMeshArr_Object *context;
if(PyArg_ParseTuple(args,"O!",&iMeshArr_Type,&context))
{
self->instance = context->instance;
Py_XINCREF(self->instance);
}
PyErr_Clear();
Py_RETURN_NONE;
}

static PyMethodDef iMeshArrObj_methods[] = {
{ "__array_finalize__", (PyCFunction)iMeshArrObj_finalize,
METH_VARARGS, ""
},
{0}
};

static PyMemberDef iMeshArrObj_members[] = {
{"instance", T_OBJECT_EX, offsetof(iMeshArr_Object, instance),
READONLY, "base iMesh instance"},
{0}
};

static PyTypeObject iMeshArr_Type = {
PyObject_HEAD_INIT(NULL)
/* ... */
(destructor)iMeshArrObj_dealloc, /* tp_dealloc */
/* ... */
Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, /* tp_flags */
"iMesh array objects", /* tp_doc */
/* ... */
iMeshArrObj_methods, /* tp_methods */
iMeshArrObj_members, /* tp_members */
/* ... */
iMeshArrObj_new, /* tp_new */
};

PyMODINITFUNC initiMesh(void)
{
PyObject *m;
m = Py_InitModule("iMesh",module_methods);
import_array();

iMeshArr_Type.tp_base = &PyArray_Type;
if(PyType_Ready(&iMeshArr_Type) < 0)
return;
Py_INCREF(&iMeshArr_Type);
PyModule_AddObject(m,"Array",(PyObject *)&iMeshArr_Type);
}

/***** End C code *****/

And then in Python:

A = iMesh.Array(numpy.array([1,2,3,4,5]), instance=mesh)
numpy.zeros_like(A) # fails here

Inside NumPy, zeros_like looks like this (there's a bit more than this,
but it's irrelevant to this problem):

def zeros_like(a):
if isinstance(a, ndarray):
res = ndarray.__new__(type(a), a.shape, a.dtype,
order=a.flags.fnc)
res.fill(0)
return res
 
C

Carl Banks

Unfortunately, I can't do that, since that call is in NumPy itself and
it's part of their "standard" way of making instances of subclasses of
ndarray. Functions like numpy.zeros_like use ndarray.__new__(subtype,
...) to create new arrays based on the shape of other arrays.

The Python version of the subclass is shown here:
<http://docs.scipy.org/doc/numpy/user/basics.subclassing.html#slightly...>,
and I'm trying to write something pretty similar in C. I'm trying to
stay in C since everything else is in C, so it's easier to stay in C
then to jump back and forth all the time.

Maybe the real answer to this question is "NumPy is doing it wrong" and
I should be on their list; still, it seems strange that the behavior is
different between Python and C.

I would say numpy is wrong here, so I suggest filing a bug report.

In fact I can't think of any benefit to EVER calling X.__new__(Y)
where X is not Y. Maybe old-style classes? Someone who wants to
ensure they're getting an instance of a certain type can check
issubclass(Y,X) then call Y.__new__(Y).

Unfortunately, you just can't get rid of the test in tp_new_wrapper.


Carl Banks
 
R

Robert Kern

I would say numpy is wrong here, so I suggest filing a bug report.

In fact I can't think of any benefit to EVER calling X.__new__(Y)
where X is not Y. Maybe old-style classes? Someone who wants to
ensure they're getting an instance of a certain type can check
issubclass(Y,X) then call Y.__new__(Y).

Well, the Y.__new__(Y) may call X.__new__(Y) (and we certainly do this
successfully in other Python subclasses of ndarray; this also appears in the
Python regression tests). I'm not sure why this would be permitted there and not
in a regular function (numpy.zeros_like() seems to be the function that does
this and fails for the OP). The reason we do it there instead of calling the
subclass's constructor is because the subclass's constructor may have different
arguments.

I'm happy to concede that this might be a bug in numpy, but I don't understand
why this is allowed for Python subclasses but not C subtypes.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
R

Robert Kern

And then in Python:

A = iMesh.Array(numpy.array([1,2,3,4,5]), instance=mesh)
numpy.zeros_like(A) # fails here

Inside NumPy, zeros_like looks like this (there's a bit more than this,
but it's irrelevant to this problem):

def zeros_like(a):
if isinstance(a, ndarray):
res = ndarray.__new__(type(a), a.shape, a.dtype,
order=a.flags.fnc)
res.fill(0)
return res

Well, I think we can change zeros_like() and the rest to work around this issue.
Can you bring it up on the numpy mailing list?

def zeros_like(a):
if isinstance(a, ndarray):
res = numpy.empty(a.shape, a.dtype, order=a.flags.fnc)
res.fill(0)
res = res.view(type(a))
return res
...

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 
C

Carl Banks

Well, the Y.__new__(Y) may call X.__new__(Y) (and we certainly do this
successfully in other Python subclasses of ndarray; this also appears in the
Python regression tests). I'm not sure why this would be permitted there and not
in a regular function (numpy.zeros_like() seems to be the function that does
this and fails for the OP). The reason we do it there instead of calling the
subclass's constructor is because the subclass's constructor may have different
arguments.

I'm happy to concede that this might be a bug in numpy, but I don't understand
why this is allowed for Python subclasses but not C subtypes.

Because Python subclasses (i.e., "heap types") all invoke
tp_new_wrapper, which is guaranteed to call the tp_new of the most
derived base.

C subtypes can, and often have to, replace tp_new with their own
version. Calling a base type's tp_new when you've defined your own
tp_new at the C level is dangerous.


As for the issue with a subclass's arguments being different, I'm
shocked that anyone at numpy could possibly think bypassing the
subtype's constructor is good idea.


Carl Banks
 
J

James Porter

Well, I think we can change zeros_like() and the rest to work around
this issue. Can you bring it up on the numpy mailing list?

def zeros_like(a):
if isinstance(a, ndarray):
res = numpy.empty(a.shape, a.dtype, order=a.flags.fnc)
res.fill(0)
res = res.view(type(a))
return res
...

I'm having difficulty posting to the NumPy list (both via gmane and
email) so I'm just going to put this here so it doesn't get lost.
zeros_like probably needs to call __array_finalize__ for this to work
properly (it'll cause a segfault for me otherwise):

def zeros_like(a):
if isinstance(a, ndarray):
res = numpy.zeros(a.shape, a.dtype, order=a.flags.fnc)
res = res.view(type(a))
res.__array_finalize__(a)
return res
...

- Jim
 
G

Gregory Ewing

James said:
Functions like numpy.zeros_like use ndarray.__new__(subtype,
...) to create new arrays based on the shape of other arrays.

Maybe the real answer to this question is "NumPy is doing it wrong"

Yes, I think NumPy is doing it wrong, even for subclasses
written in Python. If the subtype has overridden ndarray's
__new__ method, the way NumPy is doing it will skip the
subclass's version of the method.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,982
Messages
2,570,185
Members
46,738
Latest member
JinaMacvit

Latest Threads

Top