92

I have found some answers to this question before, but they seem to be obsolete for the current Python versions (or at least they don't work for me).

I want to check if a substring is contained in a list of strings. I only need the boolean result.

I found this solution:

word_to_check = 'or'
wordlist = ['yellow','orange','red']

result = any(word_to_check in word for word in worldlist)

From this code I would expect to get a True value. If the word was "der", then the output should be False.

However, the result is a generator function, and I can't find a way to get the True value.

Any idea?

10
  • 5
    The code you posted works fine (except for wordlist/worldlist). I'm guessing you forgot the any() call when you tried it before. Commented May 5, 2013 at 0:51
  • I missed that you already used any. Commented May 5, 2013 at 0:52
  • Taking a look at your code and comments, I think the problem is the "any" function I am using. It is probably the any function in the numpy module. So the solution would be to use the built-in function instead, but any idea on how to do this once the numpy module has been imported? Commented May 5, 2013 at 0:54
  • 4
    This problem comes up for me all the time when using ipython --pylab, which "helpfully" imports * from numpy for you. In that case you can directly use __builtin__.any without having to import __builtin__ like in Ashwini's answer, since __builtin__ shows up in interactive shells automatically. Also @DSM: apparently the behavior of numpy.any changed (for the worse) in 1.7. Commented May 5, 2013 at 1:03
  • 2
    Also, see the new answer below that shows a much faster alternative approach by combining the words into a single string. Commented May 5, 2013 at 5:08

4 Answers 4

74

Posted code

The OP's posted code using any() is correct and should work. The spelling of "worldlist" needs to be fixed though.

Alternate approach with str.join()

That said, there is a simple and fast solution to be had by using the substring search on a single combined string:

>>> wordlist = ['yellow','orange','red']
>>> combined = '\t'.join(wordlist)

>>> 'or' in combined
True
>>> 'der' in combined
False

For short wordlists, this is several times faster than the approach using any.

And if the combined string can be precomputed before the search, the in-operator search will always beat the any approach even for large wordlists.

Alternate approach with sets

The O(n) search speed can be reduced to O(1) if a substring set is precomputed in advance and if we don't mind using more memory.

Precomputed step:

from itertools import combinations

def substrings(word):
    for i, j in combinations(range(len(word) + 1), 2):
        yield word[i : j]

wordlist = ['yellow','orange','red']
word_set = set().union(*map(substrings, wordlist))

Fast O(1) search step:

>>> 'or' in word_set
True
>>> 'der' in word_set
False
Sign up to request clarification or add additional context in comments.

7 Comments

This is by far the most useful and simple solution in my opinion. It can also be shortened to one line: 'or' in '\t'.join(wordlist)
Much faster than going through the list and using 'in' expression on each item
why use '\t' instead of ' '?
@Raymond Although I do agree that the join method is perhaps clearer, it is not faster than using any(<generator>). At best, it is the same speed (when the substring is not in the list). If the word exists in the list, any will short circuit and will not check the remainder of the list. For very large lists, this can be several orders of magnitude faster than joining.
@tonysepia Any separator can be used as long it doesn't occur in the wordlist. A '\t' tab is a safe choice. A space will work most of the time unless cases like "de facto" and "de jure" are treated as word units.
|
53

You can import any from __builtin__ in case it was replaced by some other any:

>>> from  __builtin__ import any as b_any
>>> lst = ['yellow', 'orange', 'red']
>>> word = "or"
>>> b_any(word in x for x in lst)
True

Note that in Python 3 __builtin__ has been renamed to builtins.

3 Comments

You can work around the issue with numpy.any if a list comp is used instead of a generator: np.any([word in x for x in lis]).
@MarkTolonen np.any is going to be slow then as it generates the whole list first.
Relatively slower yes, noticeably slower...only the OP can say, but not for his example :)
22

You could use next instead:

colors = ['yellow', 'orange', 'red'] 
search = "or"

result = next((True for color in colors if search in color), False)

print(result) # True

To show the string that contains the substring:

colors = ['yellow', 'orange', 'red'] 
search = "or"

result = [color for color in colors if search in color]  

print(result) # Orange

1 Comment

That looks like a great way to find the objects with the substring, and could be used also for the True/False objective checking the length of the resulting array.
0

Also if someone wants to check if any of the values of a dictionary exists as a substring in a list of strings, can use this:

list_a = [
    'Copy of snap-009ecf9feb43d902b from us-west-2',
    'Copy of snap-0fe999422014504b6 from us-west-2',
    'Copy of snap-0fe999422014cscx504b6 from us-west-2',
    'Copy of snap-0fe999422sdad014504b6 from us-west-2'
]
dict_b = {
    '/dev/xvda': 'snap-0fe999422014504b6',
    '/dev/xvdsdsa': 'snap-sdvcsdvsdvs'
}

for b1 in dict_b.itervalues():
    result = next( ("found" for a1 in a if b1 in a1), "not found")
    print result 

It prints

not found
found

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.