7

I want to print the contents of a file to the terminal and in the process highlight any words that are found in a list without modifying the original file. Here's an example of the not-yet-working code:

    def highlight_story(self):
        """Print a line from a file and highlight words in a list."""

        the_file = open(self.filename, 'r')
        file_contents = the_file.read()

        for word in highlight_terms:
            regex = re.compile(
                  r'\b'      # Word boundary.
                + word       # Each item in the list.
                + r's{0,1}', # One optional 's' at the end.
                flags=re.IGNORECASE | re.VERBOSE)
            subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
            result = re.sub(regex, subst, file_contents)

        print result
        the_file.close()

highlight_terms = [
    'dog',
    'hedgehog',
    'grue'
]

As it is, only the last item in the list, regardless of what it is or how long the list is, will be highlighted. I assume that each substitution is performed and then "forgotten" when the next iteration begins. It looks something like this:

Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten awat a grue, however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.

But it should look like this:

Grues have been known to eat both human and non-human animals. In poorly-lit areas dogs and hedgehogs are considered by any affluent grue to a be delicacies. Dogs can frighten away a grue, however, by barking in a musical scale. A hedgehog, on the other hand, must simply resign itself to its fate of becoming a hotdog fit for a grue king.

How can I stop the other substitutions from being lost?

0

3 Answers 3

5

You can modify your regex to the following:

regex = re.compile(r'\b('+'|'.join(highlight_terms)+r')s?', flags=re.IGNORECASE | re.VERBOSE)  # note the ? instead of {0, 1}. It has the same effect

Then, you won't need the for loop.

This code takes the list of words and then concatenates them together with a |. So if your list was something like:

a = ['cat', 'dog', 'mouse'];

The regex would be:

\b(cat|dog|mouse)s?
Sign up to request clarification or add additional context in comments.

Comments

4

The regex provided is correct, but the for loop is where you got wrong.

result = re.sub(regex, subst, file_contents)

This line substitutes the regex with subst in the file_content.

in the second iteration, it again does the substitution in file_content where as you intented to do it on result

How to correct

result = file_contents

for word in highlight_terms:
    regex = re.compile(
          r'\b'      # Word boundary.
        + word       # Each item in the list.
        + r's?\b', # One optional 's' at the end.
        flags=re.IGNORECASE | re.VERBOSE)
    print regex.pattern
    subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
    result = re.sub(regex, subst, result) #change made here

 print result

Comments

2

you need to reassign file_contents each time through the loop to the replaced string, reassigning file_contents does not change the content in the file:

def highlight_story(self):
        """Print a line from a file and highlight words in a list."""

        the_file = open(self.filename, 'r')
        file_contents = the_file.read()
        output = ""
        for word in highlight_terms:
            regex = re.compile(
                  r'\b'      # Word boundary.
                + word       # Each item in the list.
                + r's{0,1}', # One optional 's' at the end.
                flags=re.IGNORECASE | re.VERBOSE)
            subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
            file_contents  = re.sub(regex, subst, file_contents) # reassign to updatedvalue
        print file_contents
        the_file.close()

Also using with to open files is a better way to go and you can make a copy of the string outside the loop and update inside:

def highlight_story(self):
    """Print a line from a file and highlight words in a list."""
    with open(self.filename) as the_file:
        file_contents = the_file.read()
        output = file_contents # copy
        for word in highlight_terms:
            regex = re.compile(
                r'\b'  # Word boundary.
                + word  # Each item in the list.
                + r's{0,1}',  # One optional 's' at the end.
                flags=re.IGNORECASE | re.VERBOSE)
            subst = '\033[1;41m' + r'\g<0>' + '\033[0m'
            output = re.sub(regex, subst, output) # update copy
        print output
    the_file.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.