Check if substring is in a list of strings?

Question

I have found some answers to this question before, but they seem to be obsolete for the current Python versions (or at least they don't work for me).

I want to check if a substring is contained in a list of strings. I only need the boolean result.

I found this solution:

word_to_check = 'or'
wordlist = ['yellow','orange','red']

result = any(word_to_check in word for word in worldlist)

From this code I would expect to get a True value. If the word was "der", then the output should be False.

However, the result is a generator function, and I can't find a way to get the True value.

Any idea?

The code you posted works fine (except for wordlist/worldlist). I'm guessing you forgot the any() call when you tried it before. — Gareth Latty
– Gareth Latty, Commented May 5, 2013 at 0:51
Taking a look at your code and comments, I think the problem is the "any" function I am using. It is probably the any function in the numpy module. So the solution would be to use the built-in function instead, but any idea on how to do this once the numpy module has been imported? — Álvaro
– Álvaro, Commented May 5, 2013 at 0:54
This problem comes up for me all the time when using ipython --pylab, which "helpfully" imports * from numpy for you. In that case you can directly use __builtin__.any without having to import __builtin__ like in Ashwini's answer, since __builtin__ shows up in interactive shells automatically. Also @DSM: apparently the behavior of numpy.any changed (for the worse) in 1.7. — Danica
– Danica, Commented May 5, 2013 at 1:03
Also, see the new answer below that shows a much faster alternative approach by combining the words into a single string. — Raymond Hettinger
– Raymond Hettinger, Commented May 5, 2013 at 5:08

Raymond Hettinger · Accepted Answer · 2022-01-28 17:39:58Z

74

Posted code

The OP's posted code using any() is correct and should work. The spelling of "worldlist" needs to be fixed though.

Alternate approach with str.join()

That said, there is a simple and fast solution to be had by using the substring search on a single combined string:

>>> wordlist = ['yellow','orange','red']
>>> combined = '\t'.join(wordlist)

>>> 'or' in combined
True
>>> 'der' in combined
False

For short wordlists, this is several times faster than the approach using any.

And if the combined string can be precomputed before the search, the in-operator search will always beat the any approach even for large wordlists.

Alternate approach with sets

The O(n) search speed can be reduced to O(1) if a substring set is precomputed in advance and if we don't mind using more memory.

Precomputed step:

from itertools import combinations

def substrings(word):
    for i, j in combinations(range(len(word) + 1), 2):
        yield word[i : j]

wordlist = ['yellow','orange','red']
word_set = set().union(*map(substrings, wordlist))

Fast O(1) search step:

>>> 'or' in word_set
True
>>> 'der' in word_set
False

edited Jan 28, 2022 at 17:39

answered May 5, 2013 at 5:04

Raymond Hettinger

229k67 gold badges405 silver badges504 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

mjp Over a year ago

This is by far the most useful and simple solution in my opinion. It can also be shortened to one line: 'or' in '\t'.join(wordlist)

tonysepia Over a year ago

Much faster than going through the list and using 'in' expression on each item

Nihar Karve Over a year ago

why use '\t' instead of ' '?

Chris Collett Over a year ago

@Raymond Although I do agree that the join method is perhaps clearer, it is not faster than using any(<generator>). At best, it is the same speed (when the substring is not in the list). If the word exists in the list, any will short circuit and will not check the remainder of the list. For very large lists, this can be several orders of magnitude faster than joining.

Raymond Hettinger Over a year ago

@tonysepia Any separator can be used as long it doesn't occur in the wordlist. A '\t' tab is a safe choice. A space will work most of the time unless cases like "de facto" and "de jure" are treated as word units.

|

Ashwini Chaudhary · Accepted Answer · 2016-11-03 09:58:29Z

53

You can import any from __builtin__ in case it was replaced by some other any:

>>> from  __builtin__ import any as b_any
>>> lst = ['yellow', 'orange', 'red']
>>> word = "or"
>>> b_any(word in x for x in lst)
True

Note that in Python 3 __builtin__ has been renamed to builtins.

edited Nov 3, 2016 at 9:58

answered May 5, 2013 at 0:50

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

3 Comments

Mark Tolonen Over a year ago

You can work around the issue with numpy.any if a list comp is used instead of a generator: np.any([word in x for x in lis]).

Ashwini Chaudhary Over a year ago

@MarkTolonen np.any is going to be slow then as it generates the whole list first.

Mark Tolonen Over a year ago

Relatively slower yes, noticeably slower...only the OP can say, but not for his example :)

Zero Piraeus · Accepted Answer · 2017-09-11 19:06:23Z

22

You could use next instead:

colors = ['yellow', 'orange', 'red'] 
search = "or"

result = next((True for color in colors if search in color), False)

print(result) # True

To show the string that contains the substring:

colors = ['yellow', 'orange', 'red'] 
search = "or"

result = [color for color in colors if search in color]  

print(result) # Orange

edited Sep 11, 2017 at 19:06

Zero Piraeus

59.7k28 gold badges158 silver badges164 bronze badges

answered May 5, 2013 at 1:30

stderr

3871 gold badge6 silver badges18 bronze badges

1 Comment

Álvaro Over a year ago

That looks like a great way to find the objects with the substring, and could be used also for the True/False objective checking the length of the resulting array.

Kostas Demiris · Accepted Answer · 2017-03-28 11:30:17Z

0

Also if someone wants to check if any of the values of a dictionary exists as a substring in a list of strings, can use this:

list_a = [
    'Copy of snap-009ecf9feb43d902b from us-west-2',
    'Copy of snap-0fe999422014504b6 from us-west-2',
    'Copy of snap-0fe999422014cscx504b6 from us-west-2',
    'Copy of snap-0fe999422sdad014504b6 from us-west-2'
]
dict_b = {
    '/dev/xvda': 'snap-0fe999422014504b6',
    '/dev/xvdsdsa': 'snap-sdvcsdvsdvs'
}

for b1 in dict_b.itervalues():
    result = next( ("found" for a1 in a if b1 in a1), "not found")
    print result

It prints

not found
found

answered Mar 28, 2017 at 11:30

Kostas Demiris

3,66910 gold badges57 silver badges95 bronze badges

Collectives™ on Stack Overflow

Check if substring is in a list of strings?

4 Answers 4

Posted code

Alternate approach with str.join()

Alternate approach with sets

7 Comments

3 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Posted code

Alternate approach with str.join()

Alternate approach with sets

7 Comments

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related