2

I am looking for an efficient way to perform the following join on two pandas.DataFrames:

The first one contains in column A strings like:

A  ....

Spam|One
Spam|Two
Ham
Eggs

The second one is a reference table with string keys as the index and a text description:

index      description
Spam       "..."
Ham        "..." 
Eggs       "...."
BakedBeans "...."

Each key in the index can be a prefix to a string in A. Now I'd like to join the description column to column A where the value in A matches the prefix. I have a feeling that there is an efficient pandas one-liner for it...

We can assume that all values in A have the format "prefix|suffix" or "prefix". Perhaps that can speed up things.

1 Answer 1

1

Use Series.str.split with select first values of lists and Series.map by Series:

print (df1)
          A
0  Spam|One
1  Spam|Two
2       Ham
3      Eggs

print (df2)  
     description
Spam         aaa
Ham           sd
Eggs         rty

print (df2.index)
Index(['Spam', 'Ham', 'Eggs'], dtype='object')

df1['description'] = df1['A'].str.split('|').str[0].map(df2['description'])
print (df1)
          A description
0  Spam|One         aaa
1  Spam|Two         aaa
2       Ham          sd
3      Eggs         rty

More general solution should be Series.str.extract, with ^ for extract starting strings and join by | for regex OR by df2.index:

pat = '(' + '|'.join('^' + df2.index) + ')'
df1['new'] = df1['A'].str.extract(pat,expand=False).map(df2['description'])
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.