I have a dataframe that looks like this(df):
HOUSEID PERSONID WHY_TRP
20000017 1 1
20000017 1 1
20000017 1 1
20000017 2 1
20000017 2 3
20000231 1 11
20000231 1 11
20000231 2 11
20000521 1 11
20000521 2 11
20000521 2 3
Each row describes a trip made by a person. I have another dataframe of the same kind in which each row describes a person(df_p):
HOUSEID PERSONID
20000017 1
20000017 2
20000231 1
20000231 2
20000521 1
20000521 2
I want to make three new columns in the second dataframe to show the frequency of 1, 3 and 11 for each person. basically I already have a second dataframe (df_p) with other features so I shouldn't use groupby. for some reason the first and second dataframe don't have the same number of people. that's why I needed the strategy below. This is the code I tried but took hours to complete(1 million iterations):
df_p.insert(2, 'WHY_TRP_1', 0)
df_p.insert(3, 'WHY_TRP_2', 0)
df_p.insert(4, 'WHY_TRP_3', 0)
def trip_counter(i, r):
if r[2] == 1:
df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] == r[1]), ['WHY_TRP_1']] += 1
elif r[2] == 3:
df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] == r[1]), ['WHY_TRP_3']] += 1
elif r[2] == 11:
df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] == r[1]), ['WHY_TRP_11']] += 1
for i ,r in df.iterrows():
trip_counter(i ,r)
output:
HOUSEID PERSONID WHY_TRP_1 WHY_TRP_3 WHY_TRP_11
20000017 1 3 0 0
20000017 2 1 1 0
20000231 1 0 0 2
20000231 2 0 0 1
20000521 1 0 0 1
20000521 2 0 1 1
Is there a faster way to do this?
thank you