Construct a class that allows calling methods through numpy array

Question

I define a class Tomato and create an array with several objects of that class:

import numpy as np
class Tomato:
    color = None
    radius = None
    def __init__(self):
        color = np.random.choice(['red', 'green'])
        radius = np.random.rand()

arr1 = np.array([Tomato() for i in range(6)]).reshape(3, 2)
arr1

yields

array([[<__main__.Tomato object at 0x000002479E7D97F0>,
        <__main__.Tomato object at 0x000002479E7D9710>],
       [<__main__.Tomato object at 0x000002479E7D94E0>,
        <__main__.Tomato object at 0x000002479E7D9DA0>],
       [<__main__.Tomato object at 0x000002479E710630>,
        <__main__.Tomato object at 0x000002479E7D9C18>]], dtype=object)

I would like to be able to call

arr1.radius

and get a 3x2 array containing just the radius of each tomato. I know I can use np.vectorize() or a lambda expression, as recommended in questions like this one where the asker was working with objects from an externally imported cftime class.

But I believe I should have more options since I defined the Tomato class myself.

For example, the complex128 data type has the methods .real and .imag, and so does an array of complex floats.

arr2 = np.random.normal(size=(3, 2)) + 1j * np.random.normal(size=(3, 2))
arr2.imag

gives you the imaginary part of each entry:

array([[-0.23054982,  0.04599812],
       [-0.07459619, -0.11282513],
       [-0.32441139,  0.8920348 ]])

Is there a way to modify Tomato's class definition to allow users to access its attributes through a numpy array?

If not, how does the arr2 example above work? Are the .real and .imag methods specified manually in the code for the numpy array class?

numpy is not designed to work with python objects. At least, not efficiently. complex128 is just a wrapper type, the primitive numeric type that numpy arrays actually contain aren't actually Python objects. At that point, you might as well just use regular lists. In this case you may actually want to use a structured array, which allows you to work with numpy arrays of primitive structs, where you can write numpy code in terms of the struct's fields. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Jul 15, 2020 at 3:31
Note, all numpy.ndarray objects have a imag and real attributes. That isn't due to the dtype=complex128. — juanpa.arrivillaga
– juanpa.arrivillaga, Commented Jul 15, 2020 at 3:36
The actual thing I'm developing is a sort of particle simulator. I have a particle class with position and velocity attributes. I put the particles in an array, and then I would like to use some of the convenient vectorized methods built into numpy (like np.linalg.norm) to perform calculations that depend on these attributes, and then update the attributes themselves based on the results of the calculations. But I am also new to the idea of object-oriented programming in general, and just trying to get a feel for what's possible and what's reasonable. — Max
– Max, Commented Jul 15, 2020 at 3:42
Similar question earlier today stackoverflow.com/questions/62903596/…. The fast numpy code is compiled, using c numeric types. Your arr1 contains references to objects stored elsewhere in memory. numpy does not a mechanism for reaching into your Python code and treat it in a compiled manner. — hpaulj
– hpaulj, Commented Jul 15, 2020 at 3:42

Han-Kwang Nienhuys · Accepted Answer · 2020-07-15 13:15:43Z

You can define your own class for the use cases that you provide (i.e., slicing and initialization):

import numpy as np
class TomatoArr:
    def __init__(self, col, r):
        self.col = col if isinstance(col, np.ndarray) else np.array(col, dtype='<U5')
        self.r = r if isinstance(r, np.ndarray) else np.array(r, dtype=float)
    
    def __getitem__(self, idx):
        return TomatoArr(self.col[idx], self.r[idx])
   
    @classmethod
    def from_list(cls, tlist):
        n = len(tlist)
        col = np.array([a.col for a in tlist], dtype='<U5')
        r = np.array([a.r for a in tlist], dtype=float)
        return cls(col, r)

Use:

In [14]: tom = TomatoArr([['red', 'green'], ['green', 'red']], [[1.0, 1.2], [0.9, 1.1]])

In [15]: tom.col
Out[15]: 
array([['red', 'green'],
       ['green', 'red']], dtype='<U5')

In [16]: tom[:, 1].r
Out[16]: array([1.2, 1.1])

In [17]: tom[:, 0].r += 100

In [18]: tom.r
Out[18]: 
array([[101. ,   1.2],
       [100.9,   1.1]])

In [19]: tom2 = TomatoArr.from_list([TomatoArr('red', 1.3), TomatoArr('red', 1.4)])

In [20]: tom2.r
Out[20]: array([1.3, 1.4])

In [21]: tom2.col
Out[21]: array(['red', 'red'], dtype='<U5')

Of course, other numpy operations don't work and would make sense anyway - what is the sum of 'red' and 'green'?

Note: what won't work is assignment with boolean indexing:

# no effect
tom[tom.col=='red'].r = 10

You could add a method __setitem__, but that would only work for something like

tom[tom.col=='red'] = TomatoArr('red', 10)

Collectives™ on Stack Overflow

Construct a class that allows calling methods through numpy array

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related