How to find regexp patterns in a Python list?

Question

I have a list of bad word. Let's say it is:

BAD_WORDS = ['bw1', 'bw2',...]

Now I'm wondering what is the most efficient way to check a long string (aka a django request post) in a code like:

if re.search(comment.body) in BAD_WORDS:        
        dosomething;

Joel Cornett · Accepted Answer · 2014-02-11 22:07:21Z

2

The best way is to use one expression for all the bad words:

import re
bad_words = ['bw1', 'bw2', ... ]

my_expression = '|'.join(re.escape(word) for word in bad_words)
if re.search(my_expression, comment.body):
    do_something()

answered Feb 11, 2014 at 22:07

Joel Cornett

24.8k9 gold badges69 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Rishi Over a year ago

Good answer, however, I would pass flags=re.IGNORECASE into re.search to account for case insensitivity.

Adam Smith Over a year ago

@Rishi I'd say that would have to be up to the implementer, not suggested by the answerer. It could be that ass is a censored work but ASS is the Association for Sentimental Sapiens or etc.

supermario Over a year ago

@Rishi how should I add the flag?

Rishi Over a year ago

@supermario re.search(my_expression, comment.body, flags=re.IGNORECASE)

Ashwini Chaudhary · Accepted Answer · 2014-02-11 22:16:32Z

1

You can use any for this.

To match only the substring not exact word you can use the in operator:

if any(word in comment.body for word in BAD_WORDS):
    #do something

To match exact word use regex:

import re
if any(re.search(r'\b{}\b'.format(re.escape(word)), comment.body)
                                                            for word in BAD_WORDS):
    #do something

answered Feb 11, 2014 at 22:16

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Collectives™ on Stack Overflow

How to find regexp patterns in a Python list?

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related