[[ 0.0 0.2 0.4 0.6 ]
[ 0.0 0.0 2.0 4.0 6.0 ]
[ 0.0 0.0 0.0 20.0 40.0 60.0 ]]
is a ragged list. We can build that with viectorized array magic, at least not with the ordinary stuff.
To get around this we need to either flatten or pad this structure
[[ 0.0 0.2 0.4 0.6 0.0 0.0]
[ 0.0 0.0 2.0 4.0 6.0 0.0 ]
[ 0.0 0.0 0.0 20.0 40.0 60.0 ]]
or
[ 0.0 0.2 0.4 0.6 0.0 0.0 2.0 4.0 6.0 0.0 0.0 0.0 20.0 40.0 60.0 ]
sum.reduceat lets us sum blocks of a flat array, but you want a skip-sum. I suppose I could explore flattening the transpose.
My first thought was that the padded array looks like a diagonalized one, the [.2,2,20] layed out in on a diagonal, [.4,4,40] on the next offset, and so on. I know sparse can build a matrix from a matrix and set of offsets, but I don't think there's such a function in numpy. They all work with one offset at a time.
But it also looks like the kind of offset that stride_tricks can produce.
Let's explore that:
In [458]: as_strided =np.lib.index_tricks.as_strided
In [459]: Z=np.pad(z,[[0,0],[3,3]],mode='constant')
In [460]: Z
Out[460]:
array([[ 0. , 0. , 0. , 0.2, 0.4, 0.6, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 2. , 4. , 6. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 20. , 40. , 60. , 0. , 0. , 0. ]])
In [461]: Z.strides
Out[461]: (72, 8) # prod an offset with (72+8, 8)
In [462]: as_strided(Z,shape=(3,6),strides=(80,8))
Out[462]:
array([[ 0. , 0. , 0. , 0.2, 0.4, 0.6],
[ 0. , 0. , 2. , 4. , 6. , 0. ],
[ 0. , 20. , 40. , 60. , 0. , 0. ]])
That's the kind of shift we want, but the direction is wrong; so lets flip Z:
In [463]: Z1=Z[::-1,:].copy()
In [464]: as_strided(Z1,shape=(3,6),strides=(80,8))
Out[464]:
array([[ 0. , 0. , 0. , 20. , 40. , 60. ],
[ 0. , 0. , 2. , 4. , 6. , 0. ],
[ 0. , 0.2, 0.4, 0.6, 0. , 0. ]])
In [465]: as_strided(Z1,shape=(3,6),strides=(80,8)).sum(0)
Out[465]: array([ 0. , 0.2, 2.4, 24.6, 46. , 60. ])
Generalization can be left to the reader.
Whether's any speed advantage is unknown. Probably not for this small case, maybe yes for a very large one.
This cleans up the padding and striding a bit
In [497]: Z=np.pad(z,[[0,0],[1,4]],mode='constant')
In [498]: Z.strides
Out[498]: (64, 8)
In [499]: as_strided(Z,shape=(3,6),strides=(64-8,8))
Out[499]:
array([[ 0. , 0.2, 0.4, 0.6, 0. , 0. ],
[ 0. , 0. , 2. , 4. , 6. , 0. ],
[ 0. , 0. , 0. , 20. , 40. , 60. ]])
This ignores how z was constructed. If the outer product is central to the problem, I might try the striding on the 1d y, and use x to perform a weighted sum.
In [553]: x=np.array([1,10,100]); y=np.array([.2,.4,.6])
In [554]: z=np.concatenate(([0,0],y[::-1],[0,0,0]))
In [555]: z
Out[555]: array([ 0. , 0. , 0.6, 0.4, 0.2, 0. , 0. , 0. ])
In [556]: Z=as_strided(z,shape=(3,6), strides=(8,8))
In [557]: Z
Out[557]:
array([[ 0. , 0. , 0.6, 0.4, 0.2, 0. ],
[ 0. , 0.6, 0.4, 0.2, 0. , 0. ],
[ 0.6, 0.4, 0.2, 0. , 0. , 0. ]])
In [558]: np.dot(x,Z)
Out[558]: array([ 60. , 46. , 24.6, 2.4, 0.2, 0. ])
In this construction Z is a view on z, so is smaller than the Z in the previous. But I sure dot makes a copy when it sends it to the compiled code. np.einsum('i,ij',x,Z) might avoid that, doing its compiled iteration of the view without expanding it. This may make a difference when dealing with very large arrays.
The result is reversed, but that's easily fixed. I may even be able to fix it during construction.
outeran essential part of the problem, or just a convenient way of creatingz? Is this really some sort of inner product or weighted moving sum?