Numpy object arrays

Question

I've recently run into issues when creating Numpy object arrays using e.g.

a = np.array([c], dtype=np.object)

where c is an instance of some complicated class, and in some cases Numpy tries to access some methods of that class. However, doing:

a = np.empty((1,), dtype=np.object)
a[0] = c

solves the issue. I'm curious as to what the difference is between these two internally. Why in the first case might Numpy try and access some attributes or methods of c?

EDIT: For the record, here is example code that demonstrates the issue:

import numpy as np

class Thing(object):

    def __getitem__(self, item):
        print "in getitem"

    def __len__(self):
        return 1

a = np.array([Thing()], dtype='object')

This prints out getitem twice. Basically if __len__ is present in the class, then this is when one can run into unexpected behavior.

The two are equivalent (object == np.object returns True) so this is not related to the issues I'm seeing. — astrofrog
– astrofrog, Commented Oct 5, 2011 at 21:21
BTW, I don't think one can solve your problem without seeing the class and some error messages. — JBernardo
– JBernardo, Commented Oct 5, 2011 at 21:24

donkopotamus · Accepted Answer · 2011-10-05 21:37:53Z

12

In the first case a = np.array([c], dtype=np.object), numpy knows nothing about the shape of the intended array.

For example, when you define

d = range(10)
a = np.array([d])

Then you expect numpy to determine the shape based on the length of d.

So similarly in your case, numpy will attempt to see if len(c) is defined, and if it is, to access the elements of c via c[i].

You can see the effect by defining a class such as

class X(object):
    def __len__(self): return 10
    def __getitem__(self, i): return "x" * i

Then

print numpy.array([X()], dtype=object)

produces

[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]

In contrast, in your second case

a = np.empty((1,), dtype=np.object)
a[0] = c

Then the shape of a has already been determined. Thus numpy can just directly assign the object.

However to an extent this is true only since a is a vector. If it had been defined with a different shape then method accesses will still occur. The following for example will still call ___getitem__ on a class

a = numpy.empty((1, 10), dtype=object)
a[0] = X()
print a

returns

[[ x xx xxx xxxx xxxxx xxxxxx xxxxxxx xxxxxxxx xxxxxxxxx]]

edited Oct 5, 2011 at 21:37

answered Oct 5, 2011 at 21:31

donkopotamus

23.4k3 gold badges58 silver badges61 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

astrofrog Over a year ago

This is exactly what I needed - basically if __len__ is defined then that is when I run into issues!

Collectives™ on Stack Overflow

Numpy object arrays

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related