These are some ways to check whether an element is in a list or not.
from timeit import timeit
import numpy as np
whitelist1 = {"bar", "baz", "x", "y"}
whitelist2 = np.array(["bar", "baz", "x", "y"])
def func1():
return {"foo"}.intersection(whitelist1)
def func2():
return "foo" in whitelist1
def func3():
return np.isin('foo',whitelist1)
def func4():
return whitelist2[np.searchsorted(whitelist2, 'foo')] == 'foo'
print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))
Time Taken by each function
func1= 0.01365450001321733
func2= 0.005112499929964542
func3 0.5342871999600902
func4= 0.17057700001168996
FOr randomly generated list
from timeit import timeit
import numpy as np
import random as rn
from string import ascii_letters
# randomLst = for a in range(500) rn.choices(ascii_letters,k=5)
randomLst = []
while len(randomLst) !=1000:
radomWord = ''.join(rn.choices(ascii_letters,k=5))
if radomWord not in randomLst:
randomLst.append(radomWord)
whitelist1 = {*randomLst}
whitelist2 = np.array(randomLst)
randomWord = rn.choice(randomLst)
randomWords = set(rn.choices(randomLst, k=100))
def func1():
return {randomWord}.intersection(whitelist1)
def func2():
return randomWord in whitelist1
def func3():
return np.isin('foo',whitelist1)
def func4():
return whitelist2[np.searchsorted(whitelist2, randomWord)] == randomWord
def func5():
return randomWords & whitelist1
print("func1=",timeit(func1,number=100000))
print("func2=",timeit(func2,number=100000))
print("func3",timeit(func3,number=100000))
print("func4=",timeit(func4,number=100000))
print("func5=",timeit(func5,number=1000)) # Here I change the number to 1000 because we check the 100 randoms word at one so number = 100000/100 = 1000.
Time taken
func1= 0.012835499946959317
func2= 0.005004600039683282
func3 0.5219665999757126
func4= 0.19900090002920479
func5= 0.0019264000002294779
Conclusion
If you want to check only one word then 'in' statement is fast
But, if you have a list of word then '&' statement is fast 'func5'
Note: function 5 returns a set with the words that are in the whitelist
if foo in whitelist:instead, then you are evaluating if the string is in the list.whitelist = whitelist or []np.core.defchararray.find(bar,foo)!=-1. For this, to work, you need to make a whitelist annparray. You will need to check if whitelist is empty or not like thisif whitelist.size == 0:ntests, consider collecting all yourfoosin a set, and make whitelist a set then usefoos & whitelist(intersection) to figure whichfoosare in yourwhitelist. Should be roughlylen(foos)and faster thenlen(foos)separate lookups.