Steps :
Initialize a 3D array, such that m and n are separate ones. This lets us broadcast values.
Index into the first three elements along the first axis of output with appropriate elements off a and make sure that those shapes are broadcastable.
Reshape the output back to 2D.
That's all the play is about here! Here's the vectorized implementation -
b_out = np.empty((3, m,n),dtype=a.dtype) # 1. Initialize
b_out[0] = a[:,0,None] # 2. Assign
b_out[1] = a[0]
b_out[2] = a
b_out.shape = (3,m*n) # 3. Reshape back to 2D
Runtime test
Approaches -
def loopy_app(a):
m,n = a.shape
b = np.empty((3, m*n),dtype=a.dtype)
k=0
for i in range (0,m):
for j in range (0,n):
b[0,k] = a[i,0]
b[1,k] = a[0,j]
b[2,k] = a[i,j]
k=k+1
return b
def vectorized_app(a):
b_out = np.empty((3, m,n),dtype=a.dtype)
b_out[0] = a[:,0,None]
b_out[1] = a[0]
b_out[2] = a
b_out.shape = (3,m*n)
return b_out
Timings -
In [194]: m=5
...: n=10
...: a = np.random.random((m, n))
...:
In [195]: %timeit loopy_app(a)
...: %timeit vectorized_app(a)
...:
10000 loops, best of 3: 28.2 µs per loop
100000 loops, best of 3: 2.48 µs per loop
In [196]: m=50
...: n=100
...: a = np.random.random((m, n))
...:
In [197]: %timeit loopy_app(a)
...: %timeit vectorized_app(a)
...:
100 loops, best of 3: 2.56 ms per loop
100000 loops, best of 3: 6.31 µs per loop
In [198]: 2560/6.31
Out[198]: 405.7052297939778
400x+ speedup on large datasets and more on larger ones!
range (0,m)andrange (0,n)to cover all elements.