Down below I provide two solution, one is using just numpy, another one using numpy+numba, both can be installed using python -m pip install numpy numba.
Try next code online!
import numpy as np
# -------- Version 1, vectorized ----------
height, width = 2, 4
start1 = np.zeros((height, width, 3), dtype = np.float64)
start1[:, :, 0] = -0.5 + np.arange(width)[None, :] / (width - 1)
start1[:, :, 1] = (-0.5 + np.arange(height)[:, None] / (height - 1)) * height / width
print(start1)
# -------- Version 2, vectorized ----------
import numba
@numba.njit(cache = True)
def compute_start1(height, width):
start1 = np.zeros((height, width, 3), dtype = np.float64)
for i in range(height):
for j in range(width):
start1[i, j, 0] = -0.5 + j / (width - 1)
start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
start1[i, j, 2] = 0
return start1
print(compute_start1(2, 4))
Output:
[[[-0.5 -0.25 0. ]
[-0.16666667 -0.25 0. ]
[ 0.16666667 -0.25 0. ]
[ 0.5 -0.25 0. ]]
[[-0.5 0.25 0. ]
[-0.16666667 0.25 0. ]
[ 0.16666667 0.25 0. ]
[ 0.5 0.25 0. ]]]
[[[-0.5 -0.25 0. ]
[-0.16666667 -0.25 0. ]
[ 0.16666667 -0.25 0. ]
[ 0.5 -0.25 0. ]]
[[-0.5 0.25 0. ]
[-0.16666667 0.25 0. ]
[ 0.16666667 0.25 0. ]
[ 0.5 0.25 0. ]]]
I provided Numba solution because it is a nice package, it allows you to make very fast code just of regular python code. No need even to think about how to implement your code as numpy functions. Almost any quite simple code can be boosted by numba, up to 50-200x times.
Numba is a Just-In-Time compiler that converts python code to C++ and then to fast machine code, this boosts original code by around 100x times on average! It is as fast and even sometimes faster than using NumPy. Also it is very closely related to numpy, it supports all numpy functions inside and even can parallelize them by providing parallel = True argument to njit function decorator, see this jit documentation for reference.
In most cases in order to vectorize and boost 100x times your code you just need to add @numba.njit line before your function and you're done. Of cause Numba can compile to pure fast C++ not any code, but most of quite simple algorithms involving a lot of loops/conditions/etc and numerical and/or numpy operations can be compiled and boosted by Numba.
Next is time measuring code for all three solutions (one non-vectorized and two vectorized). Needs installing one time modules by python -m pip install numpy numba timerit.
Try it online!
import numpy as np
# -------- Version 0, non-vectorized ----------
def f0(height, width):
start1 = np.zeros((height, width, 3), dtype = np.float32)
for i in range(height):
for j in range(width):
start1[i, j, 0] = -0.5 + j / (width - 1)
start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
return start1
# -------- Version 1, vectorized ----------
def f1(height, width):
start1 = np.zeros((height, width, 3), dtype = np.float32)
start1[:, :, 0] = -0.5 + np.arange(width)[None, :] / (width - 1)
start1[:, :, 1] = (-0.5 + np.arange(height)[:, None] / (height - 1)) * height / width
return start1
# -------- Version 2, vectorized ----------
import numba
@numba.njit(cache = True, fastmath = True)
def f2(height, width):
start1 = np.zeros((height, width, 3), dtype = np.float32)
for i in range(height):
for j in range(width):
start1[i, j, 0] = -0.5 + j / (width - 1)
start1[i, j, 1] = (-0.5 + i / (height - 1)) * height / width
return start1
# -------- Time measuring ----------
from timerit import Timerit
Timerit._default_asciimode = True
h, w = 256, 512
ra, rt = None, None
for f in [f0, f1, f2]:
print(f'{f.__name__}: ', end = '', flush = True)
tim = Timerit(num = 15, verbose = 1)
for t in tim:
a = f(h, w)
if ra is None:
ra, rt = a, tim.mean()
else:
t = tim.mean()
assert np.allclose(a, ra)
print(f'speedup {round(rt / t, 3)}x')
Output:
f0: Timed best=159.324 ms, mean=159.855 +- 0.4 ms
f1: Timed best=1.212 ms, mean=1.257 +- 0.0 ms
speedup 127.178x
f2: Timed best=1.294 ms, mean=1.310 +- 0.0 ms
speedup 122.065x