2

Right now I'm "removing" emails from a list by mapping a new list excluding the things I don't want. This looked like:

    pattern = re.compile('b\.com')

    emails = ['[email protected]', '[email protected]', '[email protected]', '[email protected]']
    emails = [e for e in emails if pattern.search(e) == None]
    # resulting list:  ['[email protected]', '[email protected]']

However, now I need to filter out multiple domains, so I have a list of domains that need to be filtered out.

    pattern_list = ['b.com', 'c.com']

Is there a way to do this still in list comprehension form or am I going to have to revert back to nested for loops?

Note: splitting the string at the @ and doing word[1] in pattern_list won't work because c.com needs to catch sub.c.com as well.

6
  • I dont like list comprehension is the best way to address this - You might be able to do it, but a lot cumbersome. Look at this solution: stackoverflow.com/questions/19150208/… Commented Sep 23, 2014 at 18:27
  • Note that your existing example will also exclude, for instance [email protected] and [email protected]. Is that what you want? Commented Sep 23, 2014 at 18:28
  • When you're making list comprehensions of list comprehensions, it's often better to use generators (change the square brackets to parens), which are more memory efficient and chain together nicely. Commented Sep 23, 2014 at 18:30
  • Also, in your regex . is a special character, so it will also exclude bob@bocom, since b.com matches bocom. Is that also what you want? Commented Sep 23, 2014 at 18:31
  • 1
    Using is None reads better than == None. Also it's a bit more efficient since is cannot be overloaded and the interpreter can just do a pointer comparison. Commented Sep 23, 2014 at 18:48

2 Answers 2

2

There are a few ways to do this, even without using a regex. One is:

[e for e in emails if not any(pat in e for pat in pattern_list)]

This will also exclude emails like [email protected] and [email protected], but so does your original solution. It does not, however, exclude cases like user@bocom, which your existing solution does. Again, it's not clear if your existing solution actually does what you think it does.

Another possibility is to combine your patterns into one with rx = '|'.join(pattern_list) and then match on that regex. Again, though, you'll need to use a more complex regex if you want to only match b.com as a full domain (not as just part of the domain or as part of the username).

Sign up to request clarification or add additional context in comments.

Comments

2
import re

pattern = re.compile('b.com$|c.com$')

emails = ['[email protected]', '[email protected]', '[email protected]', '[email protected]']

emails = [e for e in emails if pattern.search(e) == None]

print emails

what about this

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.