Highlight text in dataframe based on regex pattern

Question

Problem: I have a use case wherein I'm required to highlight the word/words with red font color in a dataframe row based on a regex pattern. I landed upon a regex pattern as it ignores all spaces, punctuation, and case sensitivity.
Source: The original source comes from a csv file. So I'm looking to load it into a dataframe, do the pattern match highlight formatting and output it on excel.
Code: The code helps me with the count of words that match in the dataframe row.

import pandas as pd
import re
df = pd.read_csv("C:/filepath/filename.csv", engine='python')
p = r'(?i)(?<![^ .,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;\r\n])'
df['Output'] =  df['Output'].apply(lambda x: re.sub(p, red_fmt.format(r"\g<0>"), x))

Sample Data:

Input
Wow... Loved this place.
Crust is not good.
The selection on the menu was great and so were the prices.
Honeslty it didn't taste THAT fresh.
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer.
The fries were great too.

Output: What I'm trying to achieve.

Your regex seems off a bit, you need to group the alternatives at the least. And using word boundaries seem more natural here, p = r'(?i)\b(?:Crust|good|selection|fresh|rubber|warmer|fries|great)\b' — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 26, 2021 at 16:49
It seems Pandas cannot use any colors when saving data to Excel. — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 26, 2021 at 19:19
Then how about giving the output in HTML? I just need to give them a visual proof that the script does its job correctly. I'm still in phase to find the solution for formatting the text in dataframe though. — Deepak
– Deepak, Commented May 28, 2021 at 5:34
If you mean to wrap the matches with tags like <span>, it is easy: df['Output'] = df['Output'].str.replace(r'(?i)\b(?:Crust|good|selection|fresh|rubber|warmer|fries|great)\b', r'<span>\g<0></span>', regex=True) — Wiktor Stribiżew
– Wiktor Stribiżew, Commented May 28, 2021 at 8:10

pppig · Accepted Answer · 2021-05-26 17:12:55Z

1

import re
# Console output color.
red_fmt = "\033[1;31m{}\033[0m"
s = """
Wow... Loved this place.
Crust is not good.
The selection on the menu was great and so were the prices.
Honeslty it didn't taste THAT fresh.
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer.
The fries were great too.
"""
p = r'(?i)(?<![^ \r\n.,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;\r\n])'


print(re.sub(p, red_fmt.format(r"\g<0>"), s))

answered May 26, 2021 at 17:12

pppig

1,2951 gold badge9 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Deepak Over a year ago

Hi pppig(that looks like luffy), Thank you. Your code works but how do I maneuver it for a dataframe column? Can I write this output to an excel file?

Deepak Over a year ago

ok. I was able to change it accordingly for a dataframe but now the output for sentence = "Crust is not good." is "[1;31m/g<0>[0m is not [1;31m/g<0>[0m." I tried out the code last night and it was working for sure but today instead of changing the font color it is replacing the word instead.

pppig Over a year ago

@Deepak I have tried no problem, you may need to check the code or show the code.

Deepak Over a year ago

Code: import re # Console output color red_fmt = "\033[1;31m{}\033[0m" p = r'(?i)(?<![^ \r\n.,?!-])crust|good|selection|fresh|rubber|Taste|fries|great|potatoes(?!-[^ .,?!;\r\n])' df['Output'] = df['Output'].apply(lambda x: re.sub(p, red_fmt.format(r"/g<0>"), x)) Input: Crust is not good. Output: [1;31m/g<0>[0m is not [1;31m/g<0>[0m.

pppig Over a year ago

@Deepak '\' has special meaning. You need to modify r"/g<0>" to r"\g<0>".

|

Collectives™ on Stack Overflow

Highlight text in dataframe based on regex pattern

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related