pandas: replace values in a column based on a condition in another dataframe if that value is in the second dataframe

Question

I have two dataframes as follows,

import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
               'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']]})

df2 = pd.DataFrame({'verbs':['go','open','close','share','divide'],
                   'new_verbs':['went','opened','closed','shared','divided']})

I would like to replace the verbs in df.text with their past form in df2.new_verbs if the verbs are found in df2.verbs. and so far I have done the following,

df['text'] = df['text'].str.split()
new_df = df.apply(pd.Series.explode)
new_df = new_df.assign(new=lambda d: d['pos'].mask(d['pos'] == 'VERB', d['text']))
new_df.text[new_df.new.isin(df2.verbs)] = df2.new_verbs

but when I print out the result, not all verbs are correctly replaced. My desired output would be,

       text    pos    new
0       I   PRON   PRON
0    went   VERB     go
0      to    ADP    ADP
0  school   NOUN   NOUN
1  opened   VERB   open
1     the    DET    DET
1   green    ADJ    ADJ
1    door   NOUN   NOUN
2    went   VERB     go
2     out    ADP    ADP
2     and  CCONJ  CCONJ
2    play   VERB   play

mozway · Accepted Answer · 2022-04-29 17:28:49Z

3

You can use a regex for that:

import re
regex = '|'.join(map(re.escape, df2['verbs']))
s = df2.set_index('verbs')['new_verbs']

df['text'] = df['text'].str.replace(regex, lambda m: s.get(m.group(), m),
                                    regex=True)

output (here as column text2 for clarity):

                  text                       pos                  text2
0       I go to school   [PRON, VERB, ADP, NOUN]       I went to school
1  open the green door    [VERB, DET, ADJ, NOUN]  opened the green door
2      go out and play  [VERB, ADP, CCONJ, VERB]      went out and play

answered Apr 29, 2022 at 17:28

mozway

267k13 gold badges56 silver badges106 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

mqqz · Accepted Answer · 2022-04-29 17:39:23Z

1

For smaller lists, you can use pandas replace and a dictionary like this:

verbs_map = dict(zip(df2.verbs, df2.new_verbs))
new_df.text.replace(verbs_map)

Basically, dict(zip(df2.verbs, df2.new_verbs) creates a new dictionary that maps old verbs to their new (past tense) verbs, e.g. {'go' : 'went' , 'close' : 'closed', ...}.

edited Apr 29, 2022 at 17:39

answered Apr 29, 2022 at 17:33

mqqz

9476 silver badges19 bronze badges

4 Comments

mozway Over a year ago

This solution will become very slow if there are many verbs to replace. I think replace with a dictionary runs as many times as there are items.

mqqz Over a year ago

Yes, ideally the dictionary should be created once at the start but apart from that the lookups are fast, I'll change the code to reflect that.

mozway Over a year ago

no, I meant due to how replace works, it checks items one after the other

mqqz Over a year ago

@mozway yeah fair enough, I'll update my answer to say that.

Collectives™ on Stack Overflow

pandas: replace values in a column based on a condition in another dataframe if that value is in the second dataframe

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related