0
  1. Problem: I have a use case wherein I'm required to highlight the word/words with red font color in a dataframe row based on a regex pattern. I landed upon a regex pattern as it ignores all spaces, punctuation, and case sensitivity.
  2. Source: The original source comes from a csv file. So I'm looking to load it into a dataframe, do the pattern match highlight formatting and output it on excel.
  3. Code: The code helps me with the count of words that match in the dataframe row.
import pandas as pd
import re
df = pd.read_csv("C:/filepath/filename.csv", engine='python')
p = r'(?i)(?<![^ .,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;\r\n])'
df['Output'] =  df['Output'].apply(lambda x: re.sub(p, red_fmt.format(r"\g<0>"), x))
  1. Sample Data:
Input
Wow... Loved this place.
Crust is not good.
The selection on the menu was great and so were the prices.
Honeslty it didn't taste THAT fresh.
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer.
The fries were great too.
  1. Output: What I'm trying to achieve.

enter image description here

6
  • Your regex seems off a bit, you need to group the alternatives at the least. And using word boundaries seem more natural here, p = r'(?i)\b(?:Crust|good|selection|fresh|rubber|warmer|fries|great)\b' Commented May 26, 2021 at 16:49
  • Thanks, I will try this pattern as well. Commented May 26, 2021 at 17:05
  • 1
    It seems Pandas cannot use any colors when saving data to Excel. Commented May 26, 2021 at 19:19
  • Then how about giving the output in HTML? I just need to give them a visual proof that the script does its job correctly. I'm still in phase to find the solution for formatting the text in dataframe though. Commented May 28, 2021 at 5:34
  • If you mean to wrap the matches with tags like <span>, it is easy: df['Output'] = df['Output'].str.replace(r'(?i)\b(?:Crust|good|selection|fresh|rubber|warmer|fries|great)\b', r'<span>\g<0></span>', regex=True) Commented May 28, 2021 at 8:10

1 Answer 1

1
import re
# Console output color.
red_fmt = "\033[1;31m{}\033[0m"
s = """
Wow... Loved this place.
Crust is not good.
The selection on the menu was great and so were the prices.
Honeslty it didn't taste THAT fresh.
The potatoes were like rubber and you could tell they had been made up ahead of time being kept under a warmer.
The fries were great too.
"""
p = r'(?i)(?<![^ \r\n.,?!-])Crust|good|selection|fresh|rubber|warmer|fries|great(?!-[^ .,?!;\r\n])'


print(re.sub(p, red_fmt.format(r"\g<0>"), s))
Sign up to request clarification or add additional context in comments.

8 Comments

Hi pppig(that looks like luffy), Thank you. Your code works but how do I maneuver it for a dataframe column? Can I write this output to an excel file?
ok. I was able to change it accordingly for a dataframe but now the output for sentence = "Crust is not good." is "[1;31m/g<0>[0m is not [1;31m/g<0>[0m." I tried out the code last night and it was working for sure but today instead of changing the font color it is replacing the word instead.
@Deepak I have tried no problem, you may need to check the code or show the code.
Code: import re # Console output color red_fmt = "\033[1;31m{}\033[0m" p = r'(?i)(?<![^ \r\n.,?!-])crust|good|selection|fresh|rubber|Taste|fries|great|potatoes(?!-[^ .,?!;\r\n])' df['Output'] = df['Output'].apply(lambda x: re.sub(p, red_fmt.format(r"/g<0>"), x)) Input: Crust is not good. Output: [1;31m/g<0>[0m is not [1;31m/g<0>[0m.
@Deepak '\' has special meaning. You need to modify r"/g<0>" to r"\g<0>".
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.