Great problem for practicing array-slicing into multi-dimensional tensors/arrays!
We will initialize the output array as a multi-dimensional 6D array and simply slice it and assign the four arrays being reshaped as 4D arrays. The intention is avoid any stacking/concatenating as those would be expensive specially when working with large arrays by instead working with reshaping of input arrays, which would be merely views.
Here's the implementation -
out = np.zeros((N,2,3,N,2,3),dtype=A.dtype)
out[:,0,:,:,0,:] = A.reshape(N,3,N,3)
out[:,0,:,:,1,:] = D.reshape(N,3,N,3)
out[:,1,:,:,0,:] = C.reshape(N,3,N,3)
out[:,1,:,:,1,:] = B.reshape(N,3,N,3)
out.shape = (6*N,6*N)
Just to explain a bit more, we had :
|------------------------ Axes for selecting A, B, C, D
np.zeros((N,2,3,N,2,3),dtype=A.dtype)
|------------------------- Axes for selecting A, B, C, D
Thus, those two axes (second and fifth) of lengths (2x2) = 4 were used to select between the four inputs.
Runtime test
Approaches -
def original_app(A, B, C, D):
final = np.zeros((6*N,6*N),dtype=A.dtype)
for i in range(N):
for j in range(N):
for k in range(3):
for l in range(3):
final[6*i + k][6*j + l] = A[3*i+k][3*j+l]
final[6*i + k + 3][6*j + l + 3] = B[3*i+k][3*j+l]
final[6*i + k + 3][6*j + l] = C[3*i+k][3*j+l]
final[6*i + k][6*j + l + 3] = D[3*i+k][3*j+l]
return final
def slicing_app(A, B, C, D):
out = np.zeros((N,2,3,N,2,3),dtype=A.dtype)
out[:,0,:,:,0,:] = A.reshape(N,3,N,3)
out[:,0,:,:,1,:] = D.reshape(N,3,N,3)
out[:,1,:,:,0,:] = C.reshape(N,3,N,3)
out[:,1,:,:,1,:] = B.reshape(N,3,N,3)
return out.reshape(6*N,6*N)
Timings and verification -
In [147]: # Setup input arrays
...: N = 200
...: A = np.random.randint(11,99,(3*N,3*N))
...: B = np.random.randint(11,99,(3*N,3*N))
...: C = np.random.randint(11,99,(3*N,3*N))
...: D = np.random.randint(11,99,(3*N,3*N))
...:
In [148]: np.allclose(slicing_app(A, B, C, D), original_app(A, B, C, D))
Out[148]: True
In [149]: %timeit original_app(A, B, C, D)
1 loops, best of 3: 1.63 s per loop
In [150]: %timeit slicing_app(A, B, C, D)
100 loops, best of 3: 9.26 ms per loop