I read regexes and their replacements from a CSV into a dictionary and then run that over a column in a Dataframe looking for locations:
for regex, replacement in regex_replace.items():
df["A"] = df["a"].str.replace(regex, replacement)
This works fine and successfully replaces the text. An example regex would be:
(?i)\b(maine)
However, I also want to capture the text that has been replaced from the regex match. I've tried this:
def find_match(regex, x):
j = re.findall(r'{0}'.format(regex), x)
return ",".join(j)
df['matches'] = df['A'].apply(lambda x: find_match(regex,str(x)))
But that doesn't find any matches - I think it's because the backslash is escaped. If I declared the regex variable as a raw string in the code, then this would work:
regex = r'(?i)\b(maine)'
However, I can't do that as it's aready stored in a variable. Is there a way to do this?
Related answers are: regex re.search is not returning the match Python Regex in Variable
]afterdf["a". But more importantly, you're assigning the result to a different column than the source. So each time through the loop it processes the original source column, discarding the replacements from the previous iterations. You need to assign back to the same column.r'{0}'.format(regex)is just the same asregex.regex_replaceand the dataframe.df["A"]anddf["a"]intentional?