Faster way to loop in python for updating value from a array in python

Question

I have a dataframe test which is as below

 Student_Id  Math  Physical  Arts Class Sub_Class
0        id_1     6         7     9     A         x
1        id_2     9         7     1     A         y
2        id_3     3         5     5     C         x
3        id_4     6         8     9     A         x
4        id_5     6         7    10     B         z
5        id_6     9         5    10     B         z
6        id_7     3         5     6     C         x
7        id_8     3         4     6     C         x
8        id_9     6         8     9     A         x
9       id_10     6         7    10     B         z
10      id_11     9         5    10     B         z
11      id_12     3         5     6     C         x

There are two arrays as listed in the My Code section: arr_list and array_top.

I want to create a new column such that it loops through each row of the dataframe and then update the value from the arrays as below:

for index, row in test.iterrows():
      test.loc[index,'Highest_Score'] = arr_list [index][array_top [index]]

This looping takes too much of time for a bigger set. Is there a faster way to do this?

My Code

import pandas as pd
import numpy as np

#Ceate dataframe
data = [
    ["id_1",6,7,9, "A", "x"],
    ["id_2",9,7,1, "A","y" ],
    ["id_3",3,5,5, "C", "x"],
    ["id_4",6,8,9, "A","x" ],
    ["id_5",6,7,10, "B", "z"],
    ["id_6",9,5,10,"B", "z"],
    ["id_7",3,5,6, "C", "x"],
    ["id_8",3,4,6, "C", "x"],
    ["id_9",6,8,9, "A","x" ],
    ["id_10",6,7,10, "B", "z"],
    ["id_11",9,5,10,"B", "z"],
    ["id_12",3,5,6, "C", "x"]
    
]

test = pd.DataFrame(data, columns = ['Student_Id', 'Math', 'Physical','Arts', 'Class', 'Sub_Class'])


#Create two arrays which are of same length as the test data
arr_list = np.array([[1, 2, 3], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6], [1, 2, 3], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6]])

array_top = np.array([[0],[1],[1],[2],[1], [0], [0],[1],[1],[2],[1], [0]])

#Create the column Highest_Scoe
for index, row in test.iterrows():
      test.loc[index,'Highest_Score'] = arr_list [index][array_top [index]]

I think you should be able to do this by converting arr_list and array_top to dataframes, then join them with test. — Barmar
– Barmar, Commented Aug 11, 2021 at 19:20

Jim · Accepted Answer · 2021-08-11 19:41:41Z

1

Looping through the arrays first to create your new column, then just assigning it to the dataframe will be much faster than looping through each row of the dataframe

71.7 µs vs 2.77 ms (a.k.a. 39 times faster) by my time trial

In [95]: %%timeit
    ...: new_test['Highest_Score'] = [arr_list[r][c][0] for r,c in enumerate(array_top)]
    ...:
    ...:
71.7 µs ± 1.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [96]: %%timeit
    ...: for index, row in test.iterrows():
    ...:       test.loc[index,'Highest_Score'] = arr_list [index][array_top [index]]
    ...:
2.77 ms ± 49.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

As a general rule with adding new data to a pandas DataFrame, you want to do all of the looping and compiling outside of pandas, and then assign all of the data all at once

answered Aug 11, 2021 at 19:41

Jim

5084 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Deb Over a year ago

Thanks @Jim ! This is pretty fast. Can you please explain a bit on how [r][c] gets updated by enumerate?

Jim Over a year ago

Yeah, enumerate() will iterate over whatever is passed to it, and return a tuple of the index and the value of the iterable. I chose r and c as variable names to represent rows and columns that will be selected from arr_list. Looking at your original loop, the Highest Score you want is just in order going down and the column is determined by the value in array_top

Collectives™ on Stack Overflow

Faster way to loop in python for updating value from a array in python

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related