0

I am trying to merge two data frames where df1 column A contains substring of df2 column B.

for example

df1:
columnA Actual
AB      Apple
CD      Straw
EF      Ele

df2:
ColumnB  Values
UserAB    1
UserCD    2
UserEF    3
UserGH    4 
UserIJ    5

I am looking for output like below, can't able to identify how can I get this done.

Output:
   ColumnB  Values  ColumnA  Actual
    UserAB    1      AB       Apple
    UserCD    2      CD       Straw
    UserEF    3      EF       Ele
    UserGH    4      NaN      NaN
    UserIJ    5      NaN      NaN
2
  • Are the substrings in ColumnA and strings in ColumnB fixed length or they vary? Commented Aug 8, 2019 at 4:07
  • they vary in column A and Column B Commented Aug 8, 2019 at 4:27

2 Answers 2

1

You can try with str.extract():

p='({})'.format('|'.join(df1.columnA))
d=df1.set_index('columnA')['Actual']
s=df2.ColumnB.str.extract(p,expand=False)

df2=df2.assign(ColumnA=s,Actual=s.map(d))

  ColumnB  Values ColumnA Actual
0  UserAB       1      AB  Apple
1  UserCD       2      CD  Straw
2  UserEF       3      EF    Ele
3  UserGH       4     NaN    NaN
4  UserIJ       5     NaN    NaN
Sign up to request clarification or add additional context in comments.

Comments

0

You can use df.append:

df1 = df1.append(pd.DataFrame([[float('nan'), float('nan')], [float('nan'), float('nan')]], columns = ["ColumnA", "Actual"])).reset_index(drop=True)
df2 = df2.merge(df1, how = "left", on = df2.index).drop("key_0", axis = 1)

output:

  ColumnB Values ColumnA Actual
0  UserAB      1      AB  Apple
1  UserCD      2      CD  Straw
2  UserEF      3      EF    Ele
3  UserGH      4     NaN    NaN
4  UserIJ      5     NaN    NaN

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.