1

I have a black list that contains banned substrings: I need to make an if statement that checks if ANY of the banned substrings are contained in given url. If it doesn't contain any of them, I want it to do A (and do it only once if any banned is present, not for each banned substring). If url contains one of the banned substrings I want it to do B.

black_list = ['linkedin.com', 'yellowpages.com', 'facebook.com', 'bizapedia.com', 'manta.com',
              'yelp.com', 'nextdoor.com', 'industrynet.com', 'twitter.com', 'zoominfo.com', 
              'google.com', 'yellow-listings.com', 'kompass.com', 'dnb.com', 'tripadvisor.com']

here are just two simple examples of urls that I'm using to check if it works. Url1 have banned substring inside, while url2 doesn't.

url1 = 'https://www.dnb.com/'
url2 = 'https://www.ok/'

I tried the code below that works but was wandering if there is better way (more computationally efficient) of doing it? I have a data frame of 100k+ urls so worried that this will be super slow.

mask = []
for banned in black_list:
    if banned in url:
        mask.append(True)
    else:
        mask.append(False)

if any(mask):
    print("there is a banned substring inside")
else:
    print("no banned substrings inside")      

Does anybody knows more efficient way of doing this?

1
  • I'm afraid that the proposed solutions are not very effective in the case of huge black_list. The proposed have a time complexity of O(mn) where m and n are the size of the black_list and the url set. I think with proper preprocessing it should be possible to reduce it to O(n). The only method I come up with is to use re, but I'm not sure if it delivers this improvement. Commented Feb 25, 2023 at 9:56

2 Answers 2

2

Here is a possible one-line solution:

print('there is a banned substring inside'
      if any(banned_str in url for banned_str in black_list)
      else 'no banned substrings inside')

If you prefer a less pythonic approach:

if any(banned_str in url for banned_str in black_list):
    print('there is a banned substring inside')
else:
    print('no banned substrings inside')
Sign up to request clarification or add additional context in comments.

Comments

0

You should add a flag depending on which perform either A or B.

ban_flag = False
for banned in black_list:
    if banned not in url:
        continue
    else:
        ban_flag = True
if ban_flag:
    print("there is a banned substring inside")
else:
    print("no banned substrings inside")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.