how to build a numpy object array (object includes another array)

Question

In another language I like to use object arrays containing every class object, and each object is very efficiently accessible via the object array. I am trying to do the same with Python and numpy. Each object has a number of members of different type, including a numpy array itself. So in the end result I need an object array of all objects which can efficiently be accessed and return any member, most importantly the member array.

I tried something like this:

class TestClass():
    objectarray=np.empty([10, 1], dtype=np.object)  ## static array holding all class objects
    def __init__(self,name,position):
        self.name=name
        self.position=position
        self.intmember= 5
        self.floatmember=3.4
        self.arraymember= np.zeros([5, 5])  ## another array which is a member of the class
        TestClass.objectarray[position]=self

then:

testobj1 = TestClass('test1',5)  ## create a new object and add it at position 5 into the object array

Something seems to have happened

TestClass.objectarray

array([[None],
       [None],
       [None],
       [None],
       [None],
       [<__main__.TestClass object at 0x000000EF214DC308>],
       [None],
       [None],
       [None],
       [None]], dtype=object)

However this doesnt work:

a= TestClass.objectarray[5]
a.intmember
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-40-dac52811af13> in <module>
      1 a= TestClass.objectarray[5]
----> 2 a.intmember

AttributeError: 'numpy.ndarray' object has no attribute 'intmember'

What I am doing wrong? Remember this needs to be an efficient mechanism inside a large loop

(PS (I know I could use a list of objects, but iterating over lists is prohibitively slow in my testing. Hence I want to use numpy arrays, ideally augmented by numba)

The fast numpy code works for numeric dtypes, not object dtype. In that case it can iterate in compiled code. But an object dtype array holds references to objects, just like a list does. Do some basic timings; you'll see, I think, that iteration on an object dtype array is slower (than list iteration). — hpaulj
– hpaulj, Commented Apr 3, 2020 at 21:46
I would select the numpy array member and only iterate over that. The rest is just referential data not part of the loop. — DISC-O
– DISC-O, Commented Apr 3, 2020 at 23:39

hpaulj · Accepted Answer · 2020-04-03 22:55:12Z

In [1]: class TestClass(): 
   ...:     objectarray=np.empty([10, 1], dtype=np.object)  ## static array holding all class o
   ...: bjects 
   ...:     def __init__(self,name,position): 
   ...:         self.name=name 
   ...:         self.position=position 
   ...:         self.intmember= 5 
   ...:         self.floatmember=3.4 
   ...:         self.arraymember= np.zeros([5, 5])  ## another array which is a member of the c
   ...: lass 
   ...:         TestClass.objectarray[position]=self 
   ...:                                                                                        
In [2]: testobj1 = TestClass('test1',5)

As defined testobj1 has an intmember attribute:

In [3]: testobj1                                                                               
Out[3]: <__main__.TestClass at 0x7fceba8acef0>
In [4]: testobj1.intmember                                                                     
Out[4]: 5

That object has also placed itself in the class array:

In [5]: TestClass.objectarray                                                                  
Out[5]: 
array([[None],
       [None],
       [None],
       [None],
       [None],
       [<__main__.TestClass object at 0x7fceba8acef0>],
       [None],
       [None],
       [None],
       [None]], dtype=object)

Since that's a 2d array, we have use 2d indexing to reference an element:

In [8]: TestClass.objectarray[5,0]                                                             
Out[8]: <__main__.TestClass at 0x7fceba8acef0>
In [9]: TestClass.objectarray[5,0].intmember                                                   
Out[9]: 5

Access with [5] just indexes on the first dimension; the object still embedded inside an array:

In [10]: TestClass.objectarray[5]                                                              
Out[10]: array([<__main__.TestClass object at 0x7fceba8acef0>], dtype=object)

I don't think creating a (10,1) array helped; a simple 1d would be just as good:

 objectarray=np.empty([10], dtype=np.object)

or just a list:

In [12]: class TestClass(): 
    ...:     objectarray=[None]*10 
    ...:     def __init__(self,name,position): 
    ...:         self.name=name 
    ...:         self.position=position 
    ...:         self.intmember= 5 
    ...:         self.floatmember=3.4 
    ...:         self.arraymember= np.zeros([5, 5])  ## another array which is a member of the 
    ...: class 
    ...:         TestClass.objectarray[position]=self 
    ...:                                                                                       
In [13]: testobj1 = TestClass('test1',5)                                                       
In [14]: testobj1                                                                              
Out[14]: <__main__.TestClass at 0x7fceac25f5c0>
In [15]: testobj1.objectarray                                                                  
Out[15]: 
[None,
 None,
 None,
 None,
 None,
 <__main__.TestClass at 0x7fceac25f5c0>,
 None,
 None,
 None,
 None]
In [16]: testobj1.objectarray[5]                                                               
Out[16]: <__main__.TestClass at 0x7fceac25f5c0>
In [17]: testobj1.objectarray[5].intmember                                                     
Out[17]: 5

Accessing an element of the list is faster than doing the same for the object array:

In [18]: timeit Out[5][5,0].intmember                                                          
149 ns ± 0.00964 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
In [19]: timeit Out[15][5].intmember                                                           
90.5 ns ± 0.0478 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

frompyfunc

I've recommended np.frompyfunc as a convenient, if not fast, way of accessing or otherwise working with object dtype arrays. For example

A function to fetch the intmember value if present:

In [28]: def getval(item): 
    ...:     try: 
    ...:         return item.intmember 
    ...:     except AttributeError: 
    ...:         return None

applied to object array:

In [29]: np.frompyfunc(getval,1,1)(Out[5])                                                     
Out[29]: 
array([[None],
       [None],
       [None],
       [None],
       [None],
       [5],
       [None],
       [None],
       [None],
       [None]], dtype=object)

applied to list:

In [30]: np.frompyfunc(getval,1,1)(Out[15])                                                    
Out[30]: 
array([None, None, None, None, None, 5, None, None, None, None],
      dtype=object)

timings:

In [31]: timeit np.frompyfunc(getval,1,1)(Out[15])                                             
14.6 µs ± 187 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [32]: timeit np.frompyfunc(getval,1,1)(Out[5])                                              
9.53 µs ± 54 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [33]: [getval(i) for i in Out[15]]                                                          
Out[33]: [None, None, None, None, None, 5, None, None, None, None]
In [34]: timeit [getval(i) for i in Out[15]]                                                   
6.53 µs ± 93.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

list comprehension on the list is fastest.

ok so I guess its back to lists but I need to understand what you are doing first.
May I ask a follow up question - and I'd be happy to open a new question if that's better - What if we don't know yet which object we need to pull, but need to iterate over all objects to find the one that matches our condition, e.g. : "Give me the object whose intmember is 5 " is there some efficient magic possible?
each object has to be queried, same as if were a list. The 'efficient numpy magic' is for numbers, not general python objects.

Collectives™ on Stack Overflow

how to build a numpy object array (object includes another array)

1 Answer 1

frompyfunc

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

frompyfunc

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related