1

I am trying to match parts of string from bad_boy to good_boy and create a column in the original df (bad_boy) called the Right Address but having hard time getting this accomplished. I have looked at the links below:

Replace whole string if it contains substring in pandas

Return DataFrame item using partial string match on rows pandas python

import pandas as pd
bad_boy = pd.read_excel('C:/Users/Programming/.xlsx')
df = pd.DataFrame(bad_boy)

print (df['Address'].head(3))

0  1234 Stack Overflow
1  7458 Python
2  8745 Pandas

good_boy = pd.read_excel('C:/Users/Programming/.xlsx')

df2 = pd.DataFrame(good_boy)

print (df2['Address'].head(10))

0 5896 Java Road
1 1234 Stack Overflow Way
2 7459 Ruby Drive
3 4517 Numpy Creek Way
4 1642 Scipy Trail
5 7458 Python Avenue
6 8745 Pandas Lane
7 9658 Excel Road
8 7255 Html Drive
9 7459 Selenium Creek Way

I tried this:

df['Right Address'] = df.loc[df['Address'].str.contains('Address', case = False, na = False, regex = False), df2['Address']]

but this throws out an error:

'None of [0.....all addresses\nName: Address, dtype: object] are in the [columns]'

Result being requested:

print (df['Right Address'].head(3))

0  1234 Stack Overflow Way
1  7458 Python Avenue
2  8745 Pandas Lane
2
  • your numbers column 1234, 7458 and 8745 all match in your two dataframes. can you just join on that and keep the df2 names? that would give your desired result. or do you need to do this by string matching? Commented May 3, 2017 at 17:25
  • That would work fine, any ideas though? Commented May 3, 2017 at 18:11

1 Answer 1

4

You can use merge combined with str.extract for partial match

df1 = df1.merge(df2, left_on = df1.Address.str.extract('(\d+)', expand = False), right_on = df2.Address.str.extract('(\d+)', expand = False), how = 'inner').rename(columns = {'Address_y': 'Right_Address'})

You get

    Address_x           Right_Address
0   1234 Stack Overflow 1234 Stack Overflow Way
1   7458 Python         7458 Python Avenue
2   8745 Pandas         8745 Pandas Lane
Sign up to request clarification or add additional context in comments.

10 Comments

Thanks, when I write df1.to_excel, the Right_Address does not show up. print (df1.columns) returns Index(['Project', 'Order Date', 'Paid Date', 'Resale Released', 'Estimated Close Date', 'Estimated Sales Price', 'Address', 'Title Company', 'Title Company Email', 'Seller', 'Builder/HO', 'Actual Close Date', 'Actual Sales Price', 'Status of Assessments', 'Closing Received', 'Unnamed: 15', 'Unnamed: 16'], dtype='object'). The Right_Address is not there.
did you assign by the merge to the df1 by doing df1 = df1.merge...?
That took care of the Address, but FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame), I tried doing df1.Address.str.extract('(\d+)'expand = False), ...did not work.
Expand = False is taking care of the warning by the way. I have edited the answer
@ and no, this solution wont work in your example as the solution is matching based on numbers like 1234, 7458 whereas in your case, you need to match strings
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.