1

Suppose I have defined a datatype, as below:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.x = x
        self.y = y 
        self.z = z 

And I have an numpy array of type mytype, which is defined as:

my_array = np.array([mytype()]*1000)

And my question is: how to extract the values of the numpy array defined above and set it to an numpy array of type np.float64 more efficiently? I have found using list comprehension is very slow when the array is large, and I guess there must be some good way to do this job. Can anyone help me out

2
  • If you want numpy speed make a (1000,3) shaped float array, or maybe a structured array with 3 fields. Arrays with objects are lttle better, maybe worse, than a list. Commented Apr 12, 2021 at 3:11
  • [mytype()]*1000) makes a list with 1000 references to the same object. Try modifying one to see what I mean. Commented Apr 12, 2021 at 3:44

2 Answers 2

1

Numpy is fast because it is nearly pure C code that is running computations on C arrays. As C arrays things need to be neat and clean; like how much space we use? what is the size of the objects for that space?, and how many objects do we have? etc. When you create a collection of arbitrary python objects (which can have dynamic size) and then want to take that collection of objects and place it into a numpy array, then each object will need to be found and converted, and there isn't really anyway around that.

my_array = np.array([mytype() for _ in range(1000)])

This is basically 1000 pointers to arbitrary objects. Numpy knows nothing about those objects except where to ask python for more information about those objects. As such, the above array has no C code to speed up the process. It is nearly equivalent to a list:

my_array = [mytype() for _ in range(1000)]

If you want to make your code faster, you shouldn't make numpy array's with arbitrary objects. Likewise, you shouldn't use python integers (which can be any size and have a lot of overhead) when you really want float64. For example, your class could be updated:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.data = np.array([x,y,z],dtype='float64')

At least now each self.data could be accessed and hstacked, and since numpy knows the exact size and shape of each object, then numpy could probably gather up all the 1000 places in memory and copy them into a new array quite quickly.

Sign up to request clarification or add additional context in comments.

Comments

1

Based on Numpy documentation here, numpy.array calls the __array__ method of an object. So, you are able to define any arbitrary conversion to a numpy.array like:

class mytype(object):
    def __init__(self, x=1, y=2, z=3):
        self.x = x
        self.y = y 
        self.z = z 

    def __array__(self):
        return np.array([self.x, self.y, self.z])

Then you are able to convert a single mytype() object to a np.array by:

tmp = mytype()
np.array(tmp)
# array([1, 2, 3])

Now, when you have a list of 1000 objects, you can map np.array to all of them:

new_list = list(map(np.array, [mytype()]*1000))
#[array([1, 2, 3]), array([1, 2, 3]), array([1, 2, 3]), array([1, 2, 3]), ...

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.