4

I have 2 pandas DataFrames that I am manipulating in Python, and I would like to combine them based on a certain column (col_1):

#df1
    col_1   col_2          
0       3       7            
1       4       8            
2       5       5    

#df2
  col_1    col_3 
0     4       78
1     5       89

and the output should look like this:

    col_1   col_2   col_3
0   3       7       0
1   4       8       78
2   5       5       89  

I'm not sure about how to go about this using 'best practices'.

Thank you.

4 Answers 4

4

Suppose you have:

df1=pd.DataFrame({'col_1': {0: 3, 1: 4, 2: 5}, 'col_2': {0: 7, 1: 8, 2: 5}})
df2=pd.DataFrame({'col_1': {0: 4, 1: 5}, 'col_3': {0: 78, 1: 89}})

You can use merge:

pd.merge(df1,df2,on='col_1',how='left').fillna(0)
Out[22]: 
   col_1  col_2  col_3
0      3      7    0.0
1      4      8   78.0
2      5      5   89.0

If you need col_3 as int (thanks to piRSquared for his suggestion):

pd.merge(df1,df2,on='col_1',how='left').fillna(0, downcast='infer')

Out[25]: 
   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89
Sign up to request clarification or add additional context in comments.

2 Comments

downcast parameter will do the trick... pd.merge(df1,df2,on='col_1',how='left').fillna(0, downcast='infer')
Or pd.merge(df1,df2,on='col_1',how='left').fillna(0).astype(dict(col_3=int))
4

You can use join after using set_index.

df1.join(df2.set_index('col_1'), on='col_1').fillna(0, downcast='infer')

   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89

Comments

2

Mapping a series if often efficient for a single column.

df1 = pd.DataFrame({'col_1': [3, 4, 5],
                    'col_2': [7, 8, 5]})

df2 = pd.DataFrame({'col_1': [4, 5],
                    'col_3': [78, 89]})

df1['col_3'] = df1['col_1'].map(df2.set_index('col_1')['col_3']).fillna(0)

#    col_1  col_2  col_3
# 0      3      7    0.0
# 1      4      8   78.0
# 2      5      5   89.0

1 Comment

Mind you, this works because unique values of df1col_1 are a super set of values in df2.col_2
2

Maybe combine_first

df1.set_index('col_1').combine_first(df2.set_index('col_1')).fillna(0).reset_index().astype(int)
Out[237]: 
   col_1  col_2  col_3
0      3      7      0
1      4      8     78
2      5      5     89

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.