-1

I read regexes and their replacements from a CSV into a dictionary and then run that over a column in a Dataframe looking for locations:

for regex, replacement in regex_replace.items():

    df["A"] = df["a"].str.replace(regex, replacement)

This works fine and successfully replaces the text. An example regex would be:

(?i)\b(maine)

However, I also want to capture the text that has been replaced from the regex match. I've tried this:

def find_match(regex, x):
    j = re.findall(r'{0}'.format(regex), x)
    return ",".join(j)

df['matches'] = df['A'].apply(lambda x: find_match(regex,str(x)))

But that doesn't find any matches - I think it's because the backslash is escaped. If I declared the regex variable as a raw string in the code, then this would work:

regex = r'(?i)\b(maine)'

However, I can't do that as it's aready stored in a variable. Is there a way to do this?

Related answers are: regex re.search is not returning the match Python Regex in Variable

6
  • I don't see how the first version works correctly. First, you're missing a ] after df["a". But more importantly, you're assigning the result to a different column than the source. So each time through the loop it processes the original source column, discarding the replacements from the previous iterations. You need to assign back to the same column. Commented Aug 31, 2023 at 16:19
  • 1
    r'{0}'.format(regex) is just the same as regex. Commented Aug 31, 2023 at 16:20
  • 1
    Please show an example of regex_replace and the dataframe. Commented Aug 31, 2023 at 16:24
  • Apologies, edited the code to include the bracket Commented Sep 1, 2023 at 11:23
  • Is the difference between df["A"] and df["a"] intentional? Commented Sep 1, 2023 at 14:40

1 Answer 1

-2

One can use f-string for that.

def find_match(regex, x):
    j = re.findall(rf'{regex}', x)
    return ",".join(j)
Sign up to request clarification or add additional context in comments.

3 Comments

rf'{regex}' just evaluates to a string exactly equal to regex.
@Tomp If this worked then you didn't actually have a problem in the first place.
rf'{regex}' is also the same as r'{0}'.format(regex) in the OP's code. The r doesn't do anything in either case, since there are no escape sequences in the format string (it doesn't apply after substitution of the variable).

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.