Is numpy optimized to work on array of arrays?

Question

Some answers on stackoverflow suggest to use a ndarray of ndarray, when working with data in which the number of elements per row is not constant (How to make a multidimension numpy array with a varying row size?).

Is numpy optimized to work on a structure like that (array of arrays, also called nested arrays) ?

Here's a simplified example of such a structure:

import numpy as np
x = np.array([1,2,3])
y = np.array([4,5])
data = np.array([x,y],dtype=object)

It's possible to do operations like:

print(data+1)
print(data+data)

But some operations would fail like :

print(np.sum(data))

What's happening behind the scenes with this type of structure ?

No. Such an array is basically the same as a list, containing references to the component arrays. — hpaulj
– hpaulj, Commented Jan 30, 2022 at 18:01
Check this ;) numpy.org/devdocs/dev/internals.html if you want to know more about how the NumPy array is organized in memory. — Khamyl
– Khamyl, Commented Jan 30, 2022 at 18:06
My comment is basically a repeat of the accepted answer in your link. There's a difference between explaining what can be done, and suggesting such a use. — hpaulj
– hpaulj, Commented Jan 30, 2022 at 18:40
Thanks for your answers. I updated the question to make it more precise. — user18048269
– user18048269, Commented Jan 30, 2022 at 20:05

hpaulj · Accepted Answer · 2022-01-30 21:56:43Z

2

Like a list, an object dtype array can contain objects of any kind. For example

In [6]: arr = np.array([1,"two",[1,2,3],np.array([4,5,6])], object)
In [7]: arr
Out[7]: array([1, 'two', list([1, 2, 3]), array([4, 5, 6])], dtype=object)

Look what happens when we do addition:

In [8]: arr+arr
Out[8]: 
array([2, 'twotwo', list([1, 2, 3, 1, 2, 3]), array([ 8, 10, 12])],
      dtype=object)
In [10]: arr*2
Out[10]: 
array([2, 'twotwo', list([1, 2, 3, 1, 2, 3]), array([ 8, 10, 12])],
      dtype=object)

For list and strings, these operations are defined as 'join/replication'. It's in effect doing [x.__add__(x) for x in arr]. where __add__ is the class specific operation.

np.exp doesn't work because it tries to do [x.exp() for in arr], and almost noone defines an exp method.

In [11]: np.exp(arr)
AttributeError: 'int' object has no attribute 'exp'

The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "<ipython-input-11-16c1c90aa297>", line 1, in <module>
    np.exp(arr)
TypeError: loop of ufunc does not support argument 0 of type int which has no callable exp method

answered Jan 30, 2022 at 21:56

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

user18048269 Over a year ago

The explanation is super clear. Thanks a lot !

Collectives™ on Stack Overflow

Is numpy optimized to work on array of arrays?

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related