0

I have a list of bad word. Let's say it is:

BAD_WORDS = ['bw1', 'bw2',...] 

Now I'm wondering what is the most efficient way to check a long string (aka a django request post) in a code like:

if re.search(comment.body) in BAD_WORDS:        
        dosomething;

2 Answers 2

2

The best way is to use one expression for all the bad words:

import re
bad_words = ['bw1', 'bw2', ... ]

my_expression = '|'.join(re.escape(word) for word in bad_words)
if re.search(my_expression, comment.body):
    do_something()
Sign up to request clarification or add additional context in comments.

4 Comments

Good answer, however, I would pass flags=re.IGNORECASE into re.search to account for case insensitivity.
@Rishi I'd say that would have to be up to the implementer, not suggested by the answerer. It could be that ass is a censored work but ASS is the Association for Sentimental Sapiens or etc.
@Rishi how should I add the flag?
@supermario re.search(my_expression, comment.body, flags=re.IGNORECASE)
1

You can use any for this.

To match only the substring not exact word you can use the in operator:

if any(word in comment.body for word in BAD_WORDS):
    #do something

To match exact word use regex:

import re
if any(re.search(r'\b{}\b'.format(re.escape(word)), comment.body)
                                                            for word in BAD_WORDS):
    #do something

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.