0

Suppose I start with a list of strings:

list1 = ['string a','string b','string c','string d']

Using list comprehension, I want to create a second list (list2) that contains strings from list1 if and only if those strings contain certain substrings.

For example, if I wanted to pull only the strings containing 'a', 'b' or 'c', I could write:

list2 = [text for text in list1 if 'a' in text or 'b' in text or 'c' in text]

but this feels clunky. Is there a way to combine my search for the three elements, something like... if ('a' or 'b' or 'c') in text?

If possible, I would like to do this without having to create a list of the substrings, i.e. ['a','b','c'] in a preceding line of code.

1 Answer 1

3

You can try this :

list1 = ['string a','string b','string c','string d']
list2 = [text for text in list1 if any(k in text for k in ['a', 'b', 'c'])]

Output :

['string a', 'string b', 'string c']

Another sample case :

list1 = ['string a','string b','string c','string d', 'string e']
list2 = [i for i in list1 if any(k in i for k in ['a', 'd'])]
# ['string a', 'string d']

Note : You can't do if ('a' or 'b' or 'c') in text because ('a' or 'b' or 'c') is always going to produce 'a' and you would end up eventually checking if 'a' in text.

Check out any() from documentation.

Additional Note : Even though this process require neither declaring any list before hand into a variable nor a cumbersome process of multiple or commands inside the if condition, your way is still way faster compared to this, specifically if your check list of characters is big. Consider this :

>>> from timeit import timeit as t
>>> t("""list1 = ['string a','string b','string c','string d']; list2 = [text for text in list1 if any(k in text for k in ['a', 'b', 'c'])]""")
2.8938145910001367
>>> t("""list1 = ['string a','string b','string c','string d']; list2 = [text for text in list1 if 'a' in text or 'b' in text or 'c' in text]""")
0.600998255000377
Sign up to request clarification or add additional context in comments.

2 Comments

Just to clarify, the code inside the any() function is a generator expression, correct?
@hseek Yes, that's a generator expression. Another variation you could use is make it into a list comprehension and apply any() on it like any([k in i for k in ['a', 'd']]) which should be slightly slower compared to generator expression.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.