1

My df data has two columns like this

thePerson  theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"

I want a result df as

thePerson  theText
"the abc" "this is about <b>the abc</b>"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about <b>WXY</b>"

Notice if theText in the same row contains thePerson, it becomes bold in theText.

One of solution I unsuccessfully tried is this:

df['theText']=df['theText'].replace(df.thePerson,'<b>'+df.thePerson+'</b>', regex=True)

I wonder if I can do this using lapply or map

My python environment is set to version 2.7

2 Answers 2

2

using re.sub and zip

tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
    theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
             for t, p in zip(tt, tp)]
)

  thePerson                       theText
0   the abc  this is about <b>the abc</b>
1       xyz             this is about tyu
2       wxy             this is about abc
3       wxy      this is about <b>WXY</b>

copy/paste
you should be able to run this exact code and get the required result

from io import StringIO
import pandas as pd

txt = '''thePerson  theText
"the abc"  "this is about the abc"
"xyz"  "this is about tyu"
"wxy"  "this is about abc"
"wxy"  "this is about WXY"'''

df = pd.read_csv(StringIO(txt), sep='\s{2,}', engine='python')

tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
    theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
             for t, p in zip(tt, tp)]
)

you should see this

   thePerson                         theText
0  "the abc"  "this is about <b>the abc</b>"
1      "xyz"             "this is about tyu"
2      "wxy"             "this is about abc"
3      "wxy"      "this is about <b>WXY</b>"
Sign up to request clarification or add additional context in comments.

7 Comments

Can you please test your code, not seeing <b> </b> when I run your code.
What you see is what was printed from running this code. However, if you have the double quote character in your strings, that will mess it up. I'll update the code with a simple fix. When I ran this, I assumed the " were not intended parts of the string. I just tested it with the " and it works.
thanks, going to try again. Just to confirm, this works on 2.7 version?
I'm on 3.6 but... yes it should.
it works, I had to do reassign df like df =df.assign( theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I) for t, p in zip(tt, tp)] )
|
1

You can use apply:

df['theText'] = df.apply(lambda x: re.sub(r'('+x.thePerson+')',
                                          r'<b>\1</b>', 
                                          x.theText, 
                                          flags=re.IGNORECASE), axis=1)
print (df)
  thePerson                       theText
0   the abc  this is about <b>the abc</b>
1       xyz             this is about tyu
2       wxy             this is about abc
3       wxy      this is about <b>WXY</b>

3 Comments

Hmmm, if need last value in uppercase df['theText'] = df.apply(lambda x: re.sub(r'({})'.format(x.thePerson), r'<b>\1</b>', x.theText, flags=re.I), axis=1)
I meant to show with last value in upper case that theText the match has to be case insensitive.
Thn need re, I try rewrite answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.