How to iterate a Pandas DataFrame and replace string if there is match of items from another column

Question

My df data has two columns like this

thePerson  theText
"the abc" "this is about the abc"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about WXY"

I want a result df as

thePerson  theText
"the abc" "this is about <b>the abc</b>"
"xyz" "this is about tyu"
"wxy" "this is about abc"
"wxy" "this is about <b>WXY</b>"

Notice if theText in the same row contains thePerson, it becomes bold in theText.

One of solution I unsuccessfully tried is this:

df['theText']=df['theText'].replace(df.thePerson,'<b>'+df.thePerson+'</b>', regex=True)

I wonder if I can do this using lapply or map

My python environment is set to version 2.7

piRSquared · Accepted Answer · 2017-05-05 17:28:13Z

2

using re.sub and zip

tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
    theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
             for t, p in zip(tt, tp)]
)

  thePerson                       theText
0   the abc  this is about <b>the abc</b>
1       xyz             this is about tyu
2       wxy             this is about abc
3       wxy      this is about <b>WXY</b>

copy/paste
you should be able to run this exact code and get the required result

from io import StringIO
import pandas as pd

txt = '''thePerson  theText
"the abc"  "this is about the abc"
"xyz"  "this is about tyu"
"wxy"  "this is about abc"
"wxy"  "this is about WXY"'''

df = pd.read_csv(StringIO(txt), sep='\s{2,}', engine='python')

tt = df.theText.values.tolist()
tp = df.thePerson.str.strip('"').values.tolist()
df.assign(
    theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)
             for t, p in zip(tt, tp)]
)

you should see this

   thePerson                         theText
0  "the abc"  "this is about <b>the abc</b>"
1      "xyz"             "this is about tyu"
2      "wxy"             "this is about abc"
3      "wxy"      "this is about <b>WXY</b>"

edited May 5, 2017 at 17:28

answered May 5, 2017 at 7:23

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Watt Over a year ago

Can you please test your code, not seeing <b> </b> when I run your code.

piRSquared Over a year ago

What you see is what was printed from running this code. However, if you have the double quote character in your strings, that will mess it up. I'll update the code with a simple fix. When I ran this, I assumed the " were not intended parts of the string. I just tested it with the " and it works.

Watt Over a year ago

thanks, going to try again. Just to confirm, this works on 2.7 version?

piRSquared Over a year ago

I'm on 3.6 but... yes it should.

Watt Over a year ago

it works, I had to do reassign df like

df =df.assign(     theText=[re.sub(r'({})'.format(p), r'<b>\1</b>', t, flags=re.I)              for t, p in zip(tt, tp)] )

|

jezrael · Accepted Answer · 2017-05-05 08:21:26Z

1

You can use apply:

df['theText'] = df.apply(lambda x: re.sub(r'('+x.thePerson+')',
                                          r'<b>\1</b>', 
                                          x.theText, 
                                          flags=re.IGNORECASE), axis=1)
print (df)
  thePerson                       theText
0   the abc  this is about <b>the abc</b>
1       xyz             this is about tyu
2       wxy             this is about abc
3       wxy      this is about <b>WXY</b>

edited May 5, 2017 at 8:21

answered May 5, 2017 at 6:59

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

3 Comments

jezrael Over a year ago

Hmmm, if need last value in uppercase df['theText'] = df.apply(lambda x: re.sub(r'({})'.format(x.thePerson), r'<b>\1</b>', x.theText, flags=re.I), axis=1)

Watt Over a year ago

I meant to show with last value in upper case that theText the match has to be case insensitive.

jezrael Over a year ago

Thn need re, I try rewrite answer.

Collectives™ on Stack Overflow

How to iterate a Pandas DataFrame and replace string if there is match of items from another column

2 Answers 2

7 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related