- Problem: I have a use case wherein I'm required to highlight the word/words with red font color in a dataframe row based on a regex pattern. I landed upon a regex pattern as it ignores all spaces, punctuation, and case sensitivity.
- Source: The original source comes from a csv file. So I'm looking to load it into a dataframe, do the pattern match highlight formatting and output it on excel.
- Code: The code helps me with the count of words that match in the dataframe row.
import pandas as pd
import re
df = pd.read_csv("C:/filepath/filename.csv", engine='python')
p = r'(?i)(?<![^ .,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;\r\n])'
df['Output'] = df['Output'].apply(lambda x: re.sub(p, red_fmt.format(r"\g<0>"), x))
- Sample Data:
| Input |
| Wow... Loved this place. |
| Crust is not good. |
| The selection on the menu was great and so were the prices. |
| Honeslty it didn't taste THAT fresh. |
| The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer. |
| The fries were great too. |
- Output: What I'm trying to achieve.

p = r'(?i)\b(?:Crust|good|selection|fresh|rubber|warmer|fries|great)\b'<span>, it is easy:df['Output'] = df['Output'].str.replace(r'(?i)\b(?:Crust|good|selection|fresh|rubber|warmer|fries|great)\b', r'<span>\g<0></span>', regex=True)