0

Have the following function that checks a certain webpage for a keywoard

def checkString():   
    url_a = 'https://launchstudio.bluetooth.com/ListingDetails/50756'
    r_a = requests.get(url_a)
    soup_a = BeautifulSoup(r_a.text)

    for blem in soup_a(text=re.compile(r'RFCOMM')):
        return True

    return False 

Have verified that my soup_a is the same as the view-source of the url, but it seems that my search will only return results contain within the head tags and have a hard time figuring out why. Any suggestions?

Python version 2.7.5

1
  • Does the page source show the RFCOMM string as one, not e.g. <b>RF</b>comm or RF<wbr/>COMM? Commented Dec 28, 2017 at 20:58

1 Answer 1

2

You need to pass lxml to the BeautifulSoup class. Also, return True will break out of the for-loop if a match is found. Thus, if RFCOMM is indeed found in the head tags, the loop will quit and no more matches will be registered. It may be better to use a list comprehension and determine if any matches are found:

from bs4 import BeautifulSoup as soup
import urllib.request as urllib
import re
def checkString():   
   url_a = 'https://launchstudio.bluetooth.com/ListingDetails/50756'
   s = soup(str(urllib.urlopen(url_a).read()), 'lxml')
   return bool([i for i in s(text=re.compile(r'RFCOMM'))])

print(checkString())

Output:

True
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks - do not care if it was found inside the heads tag, but for some reason it would not allow me to find anything outside the head tags. Using "html.parser" as argument for the BeautifulSoup did the trick for me for some reason (guess similar to what lmxl would?)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.