I am trying to conduct nested regex replacement in pandas and I am having hard time capturing all nested components in regex.
For example, I would like to remove all instances of 'ba' and 'ba ca' from column A in dataframe. But I am able to remove only 'ba' while 'ca' part of "ba ca" is not being removed because I think 'ba' is nested within 'ba ca'
df = pd.DataFrame({'A': ['ba t', 'ba ca t', 'foo', 'ba it'],'B': ['abc','abc', 'bar', 'xyz']})
replace_list=['ba','ba ca']
for i in replace_list:
df=df.replace({'A': r'^({})'.format(i)}, {'A': ''}, regex=True)
df
I would expect row index=1 for column A to be t and not ca t. Any help is highly appreciated.
A B
0 t abc
1 ca t abc
2 foo bar
3 it xyz