I have a dataframe test which is as below
Student_Id Math Physical Arts Class Sub_Class
0 id_1 6 7 9 A x
1 id_2 9 7 1 A y
2 id_3 3 5 5 C x
3 id_4 6 8 9 A x
4 id_5 6 7 10 B z
5 id_6 9 5 10 B z
6 id_7 3 5 6 C x
7 id_8 3 4 6 C x
8 id_9 6 8 9 A x
9 id_10 6 7 10 B z
10 id_11 9 5 10 B z
11 id_12 3 5 6 C x
There are two arrays as listed in the My Code section: arr_list and array_top.
I want to create a new column such that it loops through each row of the dataframe and then update the value from the arrays as below:
for index, row in test.iterrows():
test.loc[index,'Highest_Score'] = arr_list [index][array_top [index]]
This looping takes too much of time for a bigger set. Is there a faster way to do this?
My Code
import pandas as pd
import numpy as np
#Ceate dataframe
data = [
["id_1",6,7,9, "A", "x"],
["id_2",9,7,1, "A","y" ],
["id_3",3,5,5, "C", "x"],
["id_4",6,8,9, "A","x" ],
["id_5",6,7,10, "B", "z"],
["id_6",9,5,10,"B", "z"],
["id_7",3,5,6, "C", "x"],
["id_8",3,4,6, "C", "x"],
["id_9",6,8,9, "A","x" ],
["id_10",6,7,10, "B", "z"],
["id_11",9,5,10,"B", "z"],
["id_12",3,5,6, "C", "x"]
]
test = pd.DataFrame(data, columns = ['Student_Id', 'Math', 'Physical','Arts', 'Class', 'Sub_Class'])
#Create two arrays which are of same length as the test data
arr_list = np.array([[1, 2, 3], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6], [1, 2, 3], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6], [4, 5, 6]])
array_top = np.array([[0],[1],[1],[2],[1], [0], [0],[1],[1],[2],[1], [0]])
#Create the column Highest_Scoe
for index, row in test.iterrows():
test.loc[index,'Highest_Score'] = arr_list [index][array_top [index]]
arr_listandarray_topto dataframes, then join them withtest.