0

I'd like to find text in one field of a pandas dataframe ("text") based on another field ("words") of it.

#import re
import pandas as pd
df = pd.DataFrame([['I like apple pie','apple'],['Nice banana and lemon','banana|lemon']], columns=['text','words'])
df['text'] = df['text'].str.replace(r''+df['words'].str, '*'+group(0)+'*')
df

I'd like to mark the found words with *.
How can I do that?

The desired output is:
I like *apple* pie
Nice *banana* and *lemon*

2 Answers 2

1

You could capture the word from words and use backreference in the substitution to wrap it in *:

import re
import pandas as pd
df = pd.DataFrame([['I like apple pie','apple'],['Nice banana and     lemon','banana|lemon']], columns=['text','words'])

df['text'] = df['text'].replace(r'('+df['words']+')', r'*\1*', regex=True)
print(df)

Prints:

                            text         words
0             I like *apple* pie         apple
1  Nice *banana* and     *lemon*  banana|lemon
Sign up to request clarification or add additional context in comments.

1 Comment

That's the syntax, I was looking for! Thanks.
1

IIUC using (?i) is same as re.I

df.text.replace(regex=r'(?i)'+ df.words,value="*")
Out[131]: 
0        I like * pie
1    Nice * and     *
Name: text, dtype: object

Since you update the question

df.words=df.words.str.split('|')
s=df.words.apply(pd.Series).stack()
df.text.replace(dict(zip(s,'*'+s+'*')),regex=True)
Out[139]: 
0               I like *apple* pie
1    Nice *banana* and     *lemon*
Name: text, dtype: object

1 Comment

I'd like to wrap the found words with *-s. I posted the desired result in the original question.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.