1

I have a dataframe that looks like this(df):

HOUSEID    PERSONID      WHY_TRP
20000017      1            1
20000017      1            1
20000017      1            1
20000017      2            1
20000017      2            3
20000231      1            11
20000231      1            11
20000231      2            11
20000521      1            11
20000521      2            11
20000521      2            3

Each row describes a trip made by a person. I have another dataframe of the same kind in which each row describes a person(df_p):

    HOUSEID   PERSONID   
    20000017      1      
    20000017      2     
    20000231      1    
    20000231      2    
    20000521      1    
    20000521      2 

I want to make three new columns in the second dataframe to show the frequency of 1, 3 and 11 for each person. basically I already have a second dataframe (df_p) with other features so I shouldn't use groupby. for some reason the first and second dataframe don't have the same number of people. that's why I needed the strategy below. This is the code I tried but took hours to complete(1 million iterations):

df_p.insert(2, 'WHY_TRP_1', 0)
df_p.insert(3, 'WHY_TRP_2', 0)
df_p.insert(4, 'WHY_TRP_3', 0)

def trip_counter(i, r):
  if r[2] == 1:
    df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] == r[1]), ['WHY_TRP_1']] += 1 
  elif r[2] == 3:
    df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] ==  r[1]), ['WHY_TRP_3']] += 1 
  elif r[2] == 11:
    df_p.loc[(df_p['HOUSEID'] == r[0]) & (df_p['PERSONID'] ==  r[1]), ['WHY_TRP_11']] += 1


for i ,r in df.iterrows():
  trip_counter(i ,r) 

output:

     HOUSEID   PERSONID   WHY_TRP_1     WHY_TRP_3      WHY_TRP_11
    20000017      1            3            0            0
    20000017      2            1            1            0
    20000231      1            0            0            2
    20000231      2            0            0            1
    20000521      1            0            0            1
    20000521      2            0            1            1          

Is there a faster way to do this?

thank you

2 Answers 2

1

You could also do a pivot_table and then merge:

m = df.pivot_table(index=['HOUSEID','PERSONID'],
                   columns='WHY_TRP',aggfunc=len,fill_value=0)

out= df_p.merge(m.add_prefix('WHY_TRP'),left_on=['HOUSEID','PERSONID'],right_index=True)

print(out)

    HOUSEID  PERSONID  WHY_TRP1  WHY_TRP3  WHY_TRP11
0  20000017         1         3         0          0
1  20000017         2         1         1          0
2  20000231         1         0         0          2
3  20000231         2         0         0          1
4  20000521         1         0         0          1
5  20000521         2         0         1          1
Sign up to request clarification or add additional context in comments.

Comments

1

You can get a table of the counts by doing a groupby on the first dataframe and unstacking WHY_TRP, and then you can just merge it to the second:

counts = df.groupby(["HOUSEID", "PERSONID", "WHY_TRP"]).apply(len).unstack(fill_value=0)

counts.columns = counts.columns.map(lambda x: f"WHY_TRP_{x}")

counts

WHY_TRP            WHY_TRP_1  WHY_TRP_3  WHY_TRP_11
HOUSEID  PERSONID
20000017 1                 3          0           0
         2                 1          1           0
20000231 1                 0          0           2
         2                 0          0           1
20000521 1                 0          0           1
         2                 0          1           1

df2.merge(counts, how="left", left_on=["HOUSEID", "PERSONID"], right_index=True)

    HOUSEID  PERSONID  WHY_TRP_1  WHY_TRP_3  WHY_TRP_11
0  20000017         1          3          0           0
1  20000017         2          1          1           0
2  20000231         1          0          0           2
3  20000231         2          0          0           1
4  20000521         1          0          0           1
5  20000521         2          0          1           1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.