Join two dataframes on multiple columns in Python

Question

I have two dataframes with names df1 and df2.

df1=

   col1   col2  count
0   1      36   200
1   12     15   200
2   13     17   100

df2=

    product_id  product_name
0      1            abc
1      2            xyz
2      3            aaaa
3      12           qwert 
4      13           sed
5      15           qase
6      36           asdf
7      17           zxcv

The entries in col1 and col2 are product_id from df2.

I want to make a new dataframe 'df3', which has the following columns and entries.

df3=

   col1 | col1_name | col2 | col2_name | count
0   1   |   abc     |   36 |    asdf   |  200
1   12  |   qwert   |   15 |    qase   |  200
2   13  |   sed     |   17 |    zxcv   |  100

i.e add a col1_name and col2_name wherever product_id from df2 is equal to col1 & col2 values.

Is it possible to do so with:

df3 = pd.concat([df1, df2], axis=1)

My knowledge of Pandas df and Python is beginner level. Is there a way to do so? Thanks in advance.

jezrael · Accepted Answer · 2016-11-18 09:34:12Z

3

I think you can use map by dict generated from df2 and then sort columns names by sort_index:

d = df2.set_index('product_id')['product_name'].to_dict()
print (d)
{1: 'abc', 2: 'xyz', 3: 'aaaa', 36: 'asdf', 17: 'zxcv', 12: 'qwert', 13: 'sed', 15: 'qase'}

df1['col1_name'] = df1.col1.map(d)
df1['col2_name'] = df1.col2.map(d)
df1 = df1.sort_index(axis=1)
print (df1)
   col1 col1_name  col2 col2_name  count
0     1       abc    36      asdf    200
1    12     qwert    15      qase    200
2    13       sed    17      zxcv    100

df1 = df1.drop(['col1','col2'], axis=1)
print (df1)
  col1_name col2_name  count
0       abc      asdf    200
1     qwert      qase    200
2       sed      zxcv    100

edited Nov 18, 2016 at 9:34

answered Nov 18, 2016 at 9:26

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

16 Comments

Shubham R Over a year ago

yes sir,this is working, apart from that , if i only want to display col1_name | col2_name | count in my final result. is there a better way than to df1.drop('col1','col2')

Shubham R Over a year ago

appreciated,a nice solution using map()

Shubham R Over a year ago

i tried using the same logic on a bigger set of data and my col1_name and col2_name are coming NaN

jezrael Over a year ago

You get NaN if some values are not in df2.

jezrael Over a year ago

You can test it with sample and d = {1: 'abc', 2: 'xyz', 3: 'aaaa'}

|

Collectives™ on Stack Overflow

Join two dataframes on multiple columns in Python

1 Answer 1

16 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

16 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related