How to extract values of numpy array of a customized type more efficiently?

Question

Suppose I have defined a datatype, as below:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.x = x
        self.y = y 
        self.z = z

And I have an numpy array of type mytype, which is defined as:

my_array = np.array([mytype()]*1000)

And my question is: how to extract the values of the numpy array defined above and set it to an numpy array of type np.float64 more efficiently? I have found using list comprehension is very slow when the array is large, and I guess there must be some good way to do this job. Can anyone help me out

If you want numpy speed make a (1000,3) shaped float array, or maybe a structured array with 3 fields. Arrays with objects are lttle better, maybe worse, than a list. — hpaulj
– hpaulj, Commented Apr 12, 2021 at 3:11
[mytype()]*1000) makes a list with 1000 references to the same object. Try modifying one to see what I mean. — hpaulj
– hpaulj, Commented Apr 12, 2021 at 3:44

Bobby Ocean · Accepted Answer · 2021-04-12 01:44:44Z

Numpy is fast because it is nearly pure C code that is running computations on C arrays. As C arrays things need to be neat and clean; like how much space we use? what is the size of the objects for that space?, and how many objects do we have? etc. When you create a collection of arbitrary python objects (which can have dynamic size) and then want to take that collection of objects and place it into a numpy array, then each object will need to be found and converted, and there isn't really anyway around that.

my_array = np.array([mytype() for _ in range(1000)])

This is basically 1000 pointers to arbitrary objects. Numpy knows nothing about those objects except where to ask python for more information about those objects. As such, the above array has no C code to speed up the process. It is nearly equivalent to a list:

my_array = [mytype() for _ in range(1000)]

If you want to make your code faster, you shouldn't make numpy array's with arbitrary objects. Likewise, you shouldn't use python integers (which can be any size and have a lot of overhead) when you really want float64. For example, your class could be updated:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.data = np.array([x,y,z],dtype='float64')

At least now each self.data could be accessed and hstacked, and since numpy knows the exact size and shape of each object, then numpy could probably gather up all the 1000 places in memory and copy them into a new array quite quickly.

aminrd · Accepted Answer · 2021-04-12 01:55:22Z

1

Based on Numpy documentation here, numpy.array calls the __array__ method of an object. So, you are able to define any arbitrary conversion to a numpy.array like:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.x = x
        self.y = y 
        self.z = z 

    def __array__(self):
        return np.array([self.x, self.y, self.z])

Then you are able to convert a single mytype() object to a np.array by:

tmp = mytype()
np.array(tmp)
# array([1, 2, 3])

Now, when you have a list of 1000 objects, you can map np.array to all of them:

new_list = list(map(np.array, [mytype()]*1000))
#[array([1, 2, 3]), array([1, 2, 3]), array([1, 2, 3]), array([1, 2, 3]), ...

answered Apr 12, 2021 at 1:55

aminrd

5,2605 gold badges34 silver badges50 bronze badges

Collectives™ on Stack Overflow

How to extract values of numpy array of a customized type more efficiently?

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related