I have two dataframes as follows,
import pandas as pd
df = pd.DataFrame({'text':['I go to school','open the green door', 'go out and play'],
'pos':[['PRON','VERB','ADP','NOUN'],['VERB','DET','ADJ','NOUN'],['VERB','ADP','CCONJ','VERB']]})
df2 = pd.DataFrame({'verbs':['go','open','close','share','divide'],
'new_verbs':['went','opened','closed','shared','divided']})
I would like to replace the verbs in df.text with their past form in df2.new_verbs if the verbs are found in df2.verbs. and so far I have done the following,
df['text'] = df['text'].str.split()
new_df = df.apply(pd.Series.explode)
new_df = new_df.assign(new=lambda d: d['pos'].mask(d['pos'] == 'VERB', d['text']))
new_df.text[new_df.new.isin(df2.verbs)] = df2.new_verbs
but when I print out the result, not all verbs are correctly replaced. My desired output would be,
text pos new
0 I PRON PRON
0 went VERB go
0 to ADP ADP
0 school NOUN NOUN
1 opened VERB open
1 the DET DET
1 green ADJ ADJ
1 door NOUN NOUN
2 went VERB go
2 out ADP ADP
2 and CCONJ CCONJ
2 play VERB play