I would keep the loop, but would try to reduce the computations once we go inside the loop by precomputing/storing the width and height values in an array and then accessing them inside the loop. Accessing an array should be hopefully faster. Also, we would modify the shape param, instead of reshaping in the loop.
Thus, the implementation would be -
def arr1d_2D(df):
r = df.width.values
c = df.height.values
n = df.shape[0]
for i in range(n):
df.iloc[i,2].shape = (r[i],c[i])
We can go all NumPy here to work with underlying data for the bitmap column and this should be much faster -
def arr1d_2D_allNumPy(df):
r = df.width.values
c = df.height.values
n = df.shape[0]
b = df['bitmap'].values
for i in range(n):
b[i].shape = (r[i],c[i])
Sample run -
In [9]: df
Out[9]:
width height bitmap
0 3 2 [0, 1, 7, 4, 8, 1]
1 2 2 [7, 3, 8, 6]
2 2 4 [6, 8, 6, 4, 7, 0, 6, 2]
3 4 3 [8, 6, 5, 2, 2, 2, 4, 3, 3, 3, 1, 8]
4 4 3 [3, 8, 4, 8, 6, 4, 2, 3, 8, 7, 7, 4]
In [10]: arr1d_2D_allNumPy(df)
In [11]: df
Out[11]:
width height bitmap
0 3 2 [[0, 1], [7, 4], [8, 1]]
1 2 2 [[7, 3], [8, 6]]
2 2 4 [[6, 8, 6, 4], [7, 0, 6, 2]]
3 4 3 [[8, 6, 5], [2, 2, 2], [4, 3, 3], [3, 1, 8]]
4 4 3 [[3, 8, 4], [8, 6, 4], [2, 3, 8], [7, 7, 4]]
Runtime test
Approaches -
def org_app(df): # Original approach
for idx, bitmap in df['bitmap'].iteritems():
df['bitmap'][idx] = np.reshape(bitmap, (df['width'][idx], \
df['height'][idx]))
Timings -
In [43]: # Setup input dataframe and two copies for testing
...: a = np.random.randint(1,5,(1000,2))
...: df = pd.DataFrame(a, columns=(('width','height')))
...: n = df.shape[0]
...: randi = np.random.randint
...: df['bitmap'] = [randi(0,9,(df.iloc[i,0]*df.iloc[i,1])) for i in range(n)]
...:
...: df_copy1 = df.copy()
...: df_copy2 = df.copy()
...: df_copy3 = df.copy()
...:
In [44]: %timeit org_app(df_copy1)
1 loops, best of 3: 26 s per loop
In [45]: %timeit arr1d_2D(df_copy2)
10 loops, best of 3: 115 ms per loop
In [46]: %timeit arr1d_2D_allNumPy(df_copy3)
1000 loops, best of 3: 475 µs per loop
In [47]: 26000000/475.0 # Speedup with allNumPy version over original
Out[47]: 54736.84210526316
Crazy 50,000x+ speedup and just goes to show the better ways to access data, specially array data within pandas dataframes.