Match two columns in Excel file and get other column values - Python Pandas

Question

I have two Excel files, say, wb1.xlsx and wb2.xlsx.

wb1.xlsx

adsl    svc_no    port_stat    adsl.1    Comparison result
2/17
2/24
2/27
2/33
2/37
3/12

wb2.xlsx

caller_id    status    adsl    Comparison result
n/a          SP        2/37    Not Match
n/a          RE        2/24    Not Match
n/a          SP        2/27    Match
n/a          SP        2/33    Not Match
n/a          SP        2/17    Match

What I want to do is match the adsl of wb2.xlsx to wb1.xlsx and get the other values to the other columns.

My expected output is to update wb1.xlsx with the values from wb2.xlsx

adsl    svc_no    port_stat    adsl.1    Comparison result
2/17    n/a       SP           2/17      Match
2/24    n/a       RE           2/24      Not Match
2/27    n/a       SP           2/27      Match
2/33    n/a       SP           2/33      Not Match
2/37    n/a       SP           2/37      Not Match
3/12

Upon searching, I was able to check that pd.merge() is able to do the matching.

I tried it this way:

result = pd.merge(df2, pri_df, on=['adsl', 'adsl'])

Unfortunately, it creates new columns and do not update the existing. Also, it only gets the values that it was able to matched and disregard the other rows.

I also tried to get the indices of the columns in wb2.xlsx and assigned it to the columns wb1.xlsx but it just copied it literally.

Any reference that would help will do.

jezrael · Accepted Answer · 2018-05-11 08:06:33Z

2

I suggest use intersection with combine_first:

print (df1)
   adsl  svc_no  port_stat  adsl.1  Comparison result
0  2/17     NaN        NaN     NaN                NaN
1  2/24     NaN        NaN     NaN                NaN
2  2/27     NaN        NaN     NaN                NaN
3  2/33     NaN        NaN     NaN                NaN
4  2/37     NaN        NaN     NaN                NaN
5  3/12     NaN        NaN     NaN                NaN

print (df2)
   caller_id port_stat  adsl Comparison result
0        NaN        SP  2/37         Not Match
1        NaN        RE  2/24         Not Match
2        NaN        SP  2/27             Match
3        NaN        SP  2/33         Not Match
4        NaN        SP  2/17             Match

df2 = df2.rename(columns={'status':'port_stat'})
d = {'adsl.1': lambda x: x['adsl']}
df2 = df2.assign(**d)
print (df2)
   caller_id port_stat  adsl Comparison result adsl.1
0        NaN        SP  2/37         Not Match   2/37
1        NaN        RE  2/24         Not Match   2/24
2        NaN        SP  2/27             Match   2/27
3        NaN        SP  2/33         Not Match   2/33
4        NaN        SP  2/17             Match   2/17

df22 = df2[df2.columns.intersection(df1.columns)]
print (df22)
  port_stat  adsl Comparison result adsl.1
0        SP  2/37         Not Match   2/37
1        RE  2/24         Not Match   2/24
2        SP  2/27             Match   2/27
3        SP  2/33         Not Match   2/33
4        SP  2/17             Match   2/17

result = (df22.set_index('adsl')
              .combine_first(df1.set_index('adsl'))
              .reset_index()
              .reindex(columns=df1.columns))
print (result)
   adsl  svc_no port_stat adsl.1 Comparison result
0  2/17     NaN        SP   2/17             Match
1  2/24     NaN        RE   2/24         Not Match
2  2/27     NaN        SP   2/27             Match
3  2/33     NaN        SP   2/33         Not Match
4  2/37     NaN        SP   2/37         Not Match
5  3/12     NaN       NaN    NaN               NaN

edited May 11, 2018 at 8:06

answered May 11, 2018 at 6:29

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

11 Comments

Ricky Aguilar Over a year ago

How do you suggest that I would be able to wb1.xlsx with the values of *wb2.xlsx

jezrael Over a year ago

@RickyAguilar - I change answer, can you check it?

Ricky Aguilar Over a year ago

Yes. It's like their key.

jezrael Over a year ago

@RickyAguilar - Super, so solution should working nice.

Ricky Aguilar Over a year ago

Nope. The final format is the df1. The other values of empty columns will come from df2 that matched from df2['adsl'] to df1['adsl']

|

Devendra Soni · Accepted Answer · 2018-05-11 06:32:01Z

1

You can use isin function of pandas:

result = df2.loc[df2['adsl'].isin(pri_df['adsl'])]

Hope this will work for you.

answered May 11, 2018 at 6:32

Devendra Soni

3912 silver badges11 bronze badges

3 Comments

Ricky Aguilar Over a year ago

I see, it matches the dataframes from two Excel files. How do you suggest I will get the values from wb2.xlsx and update it to wb1.xlsx?

Devendra Soni Over a year ago

You can create a new excel file instead of updating the same. and then dump the previous one.

Ricky Aguilar Over a year ago

Yes, that will be more likely to happen, although how will I be able to bring the other columns of wb2.xlsx to new excel file?

Collectives™ on Stack Overflow

Match two columns in Excel file and get other column values - Python Pandas

2 Answers 2

11 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

11 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related