1

I have two pandas DataFrames :


df1 = pd.DataFrame({'user_id':['0','0','1','1','2','3','3'],
                  'friend_id':['1','2','3','2','4','4','5'],
                 'date_sent':['01-01-2020','01-01-2020','01-02-2020','01-03-2020','01-02-2020','01-03-2020','01-02-2020'],
                 'date_accepted':['01-01-2020','01-01-2020','01-02-2020',None,'01-10-2020',None,'01-21-2020']})

df2 = pd.DataFrame({'user_id':['1','1','2','2','3','3'],
                  'page_liked':['A','B','A','C','B','D']})

grouped1 = df1.groupby(['user_id','friend_id']).count()
grouped2 = df2.groupby(['user_id','page_liked']).count()
print(grouped1)

output >>>

                  date_sent  date_accepted
user_id friend_id 

0       1                  1              1
        2                  1              1
1       2                  1              0
        3                  1              1
2       4                  1              1
3       4                  1              0
        5                  1              1

grouped2

output >>>
user_id page_liked
1   A
    B
2   A
    C
3   B
    D

I am trying to merge grouped1.friend_id with grouped2.user_id. The goal would be to obtain the following table:


user_id friend_id       page_liked

0       1                  A
                           B          
        2                  A      
                           C    
1       2                  A
                           C         
        3                  B
                           D         
2       4                  Na          
3       4                  Na              
        5                  Na         

I've tried doing merge in multiple ways with no luck since the indices are multi level. I have also tried grouped1.combine_first(grouped2) but this seems to only work when the index levels are the same, so I am stuck at the moment.

4
  • What is grouped2? Commented Mar 30, 2020 at 20:56
  • When I print grouped2 does not give me what you have in the output Commented Mar 30, 2020 at 21:01
  • @DaniMesejo just try typing grouped2 into your IDE. For some reason when you do print(grouped2) it does not print anything, and that is most likely because the data frame is only an index. Commented Mar 30, 2020 at 21:05
  • grouped2 = df2.groupby(['user_id','page_liked']).count() , updated in answer Commented Mar 30, 2020 at 21:06

2 Answers 2

1

See comments in answers for key steps using reset_index(), renaming the column and doing another groupby.

import pandas as pd
df1 = pd.DataFrame({'user_id':['0','0','1','1','2','3','3'],
                  'friend_id':['1','2','3','2','4','4','5'],
                 'date_sent':['01-01-2020','01-01-2020','01-02-2020','01-03-2020','01-02-2020','01-03-2020','01-02-2020'],
                 'date_accepted':['01-01-2020','01-01-2020','01-02-2020',None,'01-10-2020',None,'01-21-2020']})
df2 = pd.DataFrame({'user_id':['1','1','2','2','3','3'],
                  'page_liked':['A','B','A','C','B','D']})
#Use reset_index() to change indexes to columns and for group 2 rename the column to match the column you want to merge with
grouped1 = df1.groupby(['user_id','friend_id']).count().reset_index()
grouped2 = df2.groupby(['user_id','page_liked']).count().reset_index().rename(columns={'user_id':'friend_id'})
#merge and drop unnecessary columns and then do another groupby if you want to re-index.
grouped3=pd.merge(grouped1, grouped2, how='left', on=['friend_id']).drop(['date_sent', 'date_accepted'], axis=1)['page_liked'].min())
grouped3
Sign up to request clarification or add additional context in comments.

Comments

0

Use join. It supports multiindex dataframe merging on multiindex.

You need to change index level name of grouped2 to match index level name of grouped1. Since you want to match on a single index level, just change the name of one level. So, on grouped2, change level name user_id to friend_id. Finally, join, reordering index levels, and reset_index and slice

df_out = grouped1.join(grouped2.rename_axis(['friend_id', 'page_liked']), 
                       how='left').swaplevel(0,1).reset_index(level=-1)[['page_liked']]

Out[82]:
                  page_liked
user_id friend_id
0       1                  A
        1                  B
        2                  A
        2                  C
1       2                  A
        2                  C
        3                  B
        3                  D
2       4                NaN
3       4                NaN
        5                NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.