There are a couple of different issues here. First, there's little to be gained by broadcasting over python objects in numpy; you'll probably do better using pure python in this case.
>>> a = np.array([[1, 2, 3], [4, 5, 6]], dtype=object)
>>> b = np.arange(1, 7).reshape(2, 3)
>>> c = [[1, 2, 3], [4, 5, 6]]
>>> %timeit a * 5
100000 loops, best of 3: 4.28 µs per loop
>>> %timeit b * 5
100000 loops, best of 3: 2.08 µs per loop
>>> %timeit [[x * 5 for x in l] for l in c]
1000000 loops, best of 3: 998 ns per loop
Those speeds will scale a bit unevenly but you get the idea.
Second, the problem isn't directly related to broadcasting. numpy will happily broadcast over python lists. The result just isn't what you expect:
>>> a = np.array([[1, 2, 3], [4, 5]], dtype=object)
>>> a * 5
array([[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3],
[4, 5, 4, 5, 4, 5, 4, 5, 4, 5]], dtype=object)
numpy allows the objects in the array to define their own versions of whichever operator or function it's broadcasting. In this case, python lists define * as repetition! This holds even for heterogenous arrays; try this: np.array([5, [1, 2]], dtype=object) * 5. The reason sin doesn't broadcast in this case is that python lists don't define sin at all.
You'd probably be better off using a fixed-width array with a mask.
>>> np.ma.array([[1, 2, 3], [4, 5, 6]], mask=[[0, 0, 0], [0, 0, 1]])
masked_array(data =
[[1 2 3]
[4 5 --]],
mask =
[[False False False]
[False False True]],
fill_value = 999999)
As you can see, you can "simulate" a ragged array this way, and it will behave just as expected.
>>> a = np.ma.array([[1, 2, 3], [4, 5, 6]], mask=[[0, 0, 0], [0, 0, 1]])
>>> np.sin(a)
masked_array(data =
[[0.841470984808 0.909297426826 0.14112000806]
[-0.756802495308 -0.958924274663 --]],
mask =
[[False False False]
[False False True]],
fill_value = 1e+20)
It's worth mentioning a few ways to create masked arrays. In your case, masked_invalid might be useful.
>>> np.ma.masked_invalid([[1, 2, 3], [4, 5, np.NaN]])
masked_array(data =
[[1.0 2.0 3.0]
[4.0 5.0 --]],
mask =
[[False False False]
[False False True]],
fill_value = 1e+20)
You can also create masked arrays using conditions:
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.ma.masked_where(x > 5, x)
masked_array(data =
[[1 2 3]
[4 5 --]],
mask =
[[False False False]
[False False True]],
fill_value = 999999)
For a full list of variations on these techniques, see here.
dtype=object! The point of using a numpy array over, say, a Pythonlistis that it allows you to accelerate operations that require looping over the elements of the array by vectorization - effectively by doing the loops in a lower level language to avoid the overhead inherent in Python. However, by using theobjecttype you lose these performance benefits. The elements in your array are just treated as 'dumb' Python objects (in this case, your array just contains two normallists, i.e. with ashape=(2,)).lists, which will be much cheaper to construct.