I am fairly new to python and coding. I am looking for a way to optimize a nested for loop. The nested for loop I have written works perfectly fine, but it takes a lot of time to run. I have explained the basic idea behind my original code and what I have tried to do, below:
data = [['a', '35-44', 'male', ['b', 'z', 'x']], ['b', '15-24', 'female', ['a', 'z', 'q']], \
['r', '35-44', 'male', ['z', 'a', 'd']], ['q', '15-24', 'female', ['u', 'k', 'b']]]
df = pd.DataFrame(data, columns= ['ID', 'age_group', 'gender', 'matching_ids'])
df is the Dataframe that I am working on. What I want to do is compare each 'ID' in df with every other 'ID' in the same df and check if it follows certain conditions.
- If the age_group is equal.
- If the gender is the same.
- If the 'ID' is in 'matched_ids'.
If these conditions are met I need to append that row to a separate dataframe (sample_df) This is the code with the nested for loop that works fine:
df_copy = df.copy()
sample_df = pd.DataFrame()
for i in range(len(df)):
for j in range(len(df)):
if (i!=j) and (df.iloc[i]['ID'] in df_copy.iloc[j]['matching_ids']) and \
(df.iloc[i]['gender'] == df_copy.iloc[j]['gender']) and\
(df.iloc[i]['age_group'] == df_copy.iloc[j]['age_group']):
sample_df = sample_df.append(df_copy.iloc[[j]])
I tried simplifying it by writing a function and using df.apply(func), but it still takes almost the same amount of time. Below is the code written with using a function:
sample_df_func = pd.DataFrame()
def func_extract(x):
for k in range(len(df)):
if (x['ID'] != df_copy.iloc[k]['ID']) and (x['ID'] in df_copy.iloc[k]['matching_ids']) and \
(x['gender'] == df_copy.iloc[k]['gender']) and\
(x['age_group'] == df_copy.iloc[k]['age_group']):
global sample_df_func
sample_df_func = sample_df_func.append(df_copy.iloc[[k]])
df.apply(func_extract, axis = 1)
sample_df_func
I am looking for ways to simplify this and optimize it further. Forgive me, if the solution to this is very simple and I am not able to figure it out.
Thanks
PS: I've just started coding 2 months back.
ID'd, k, u, x, z'.