Match strings between two dataframes and create column

Question

I am trying to match parts of string from bad_boy to good_boy and create a column in the original df (bad_boy) called the Right Address but having hard time getting this accomplished. I have looked at the links below:

Replace whole string if it contains substring in pandas

Return DataFrame item using partial string match on rows pandas python

import pandas as pd
bad_boy = pd.read_excel('C:/Users/Programming/.xlsx')
df = pd.DataFrame(bad_boy)

print (df['Address'].head(3))

0  1234 Stack Overflow
1  7458 Python
2  8745 Pandas

good_boy = pd.read_excel('C:/Users/Programming/.xlsx')

df2 = pd.DataFrame(good_boy)

print (df2['Address'].head(10))

0 5896 Java Road
1 1234 Stack Overflow Way
2 7459 Ruby Drive
3 4517 Numpy Creek Way
4 1642 Scipy Trail
5 7458 Python Avenue
6 8745 Pandas Lane
7 9658 Excel Road
8 7255 Html Drive
9 7459 Selenium Creek Way

I tried this:

df['Right Address'] = df.loc[df['Address'].str.contains('Address', case = False, na = False, regex = False), df2['Address']]

but this throws out an error:

'None of [0.....all addresses\nName: Address, dtype: object] are in the [columns]'

Result being requested:

print (df['Right Address'].head(3))

0  1234 Stack Overflow Way
1  7458 Python Avenue
2  8745 Pandas Lane

your numbers column 1234, 7458 and 8745 all match in your two dataframes. can you just join on that and keep the df2 names? that would give your desired result. or do you need to do this by string matching? — Max Power
– Max Power, Commented May 3, 2017 at 17:25

Vaishali · Accepted Answer · 2017-05-03 20:02:34Z

4

You can use merge combined with str.extract for partial match

df1 = df1.merge(df2, left_on = df1.Address.str.extract('(\d+)', expand = False), right_on = df2.Address.str.extract('(\d+)', expand = False), how = 'inner').rename(columns = {'Address_y': 'Right_Address'})

You get

    Address_x           Right_Address
0   1234 Stack Overflow 1234 Stack Overflow Way
1   7458 Python         7458 Python Avenue
2   8745 Pandas         8745 Pandas Lane

edited May 3, 2017 at 20:02

answered May 3, 2017 at 18:26

Vaishali

38.5k5 gold badges62 silver badges88 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

Jacob_Cortese Over a year ago

Thanks, when I write df1.to_excel, the Right_Address does not show up. print (df1.columns) returns

Index(['Project', 'Order Date', 'Paid Date', 'Resale Released',        'Estimated Close Date', 'Estimated Sales Price', 'Address',        'Title Company', 'Title Company Email', 'Seller', 'Builder/HO',        'Actual Close Date', 'Actual Sales Price', 'Status of Assessments',        'Closing Received', 'Unnamed: 15', 'Unnamed: 16'],       dtype='object')

. The Right_Address is not there.

Vaishali Over a year ago

did you assign by the merge to the df1 by doing df1 = df1.merge...?

Jacob_Cortese Over a year ago

That took care of the Address, but

FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)

, I tried doing df1.Address.str.extract('(\d+)'expand = False), ...did not work.

Vaishali Over a year ago

Expand = False is taking care of the warning by the way. I have edited the answer

Vaishali Over a year ago

@ and no, this solution wont work in your example as the solution is matching based on numbers like 1234, 7458 whereas in your case, you need to match strings

|

Collectives™ on Stack Overflow

Match strings between two dataframes and create column

1 Answer 1

10 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

10 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related