joining pandas dataframes by string prefix

Question

I am looking for an efficient way to perform the following join on two pandas.DataFrames:

The first one contains in column A strings like:

A  ....

Spam|One
Spam|Two
Ham
Eggs

The second one is a reference table with string keys as the index and a text description:

index      description
Spam       "..."
Ham        "..." 
Eggs       "...."
BakedBeans "...."

Each key in the index can be a prefix to a string in A. Now I'd like to join the description column to column A where the value in A matches the prefix. I have a feeling that there is an efficient pandas one-liner for it...

We can assume that all values in A have the format "prefix|suffix" or "prefix". Perhaps that can speed up things.

jezrael · Accepted Answer · 2019-03-17 13:15:09Z

1

Use Series.str.split with select first values of lists and Series.map by Series:

print (df1)
          A
0  Spam|One
1  Spam|Two
2       Ham
3      Eggs

print (df2)  
     description
Spam         aaa
Ham           sd
Eggs         rty

print (df2.index)
Index(['Spam', 'Ham', 'Eggs'], dtype='object')

df1['description'] = df1['A'].str.split('|').str[0].map(df2['description'])
print (df1)
          A description
0  Spam|One         aaa
1  Spam|Two         aaa
2       Ham          sd
3      Eggs         rty

More general solution should be Series.str.extract, with ^ for extract starting strings and join by | for regex OR by df2.index:

pat = '(' + '|'.join('^' + df2.index) + ')'
df1['new'] = df1['A'].str.extract(pat,expand=False).map(df2['description'])

edited Mar 17, 2019 at 13:15

answered Mar 17, 2019 at 13:08

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

joining pandas dataframes by string prefix

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related