0

I want to scrape from multiple websites with similar url's such as https://woollahra.ljhooker.com.au/our-team, https://chinatown.ljhooker.com.au/our-team and https://bondibeach.ljhooker.com.au/our-team.

I have already written a script that works for the first website, however I am unsure how to tell it to scrape from the other two websites.

My code:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = "https://woollahra.ljhooker.com.au/our-team"

page_soup = soup(page_html, "html.parser")  
containers = page_soup.findAll("div", {"class":"team-details"})

for container in containers:
    agent_name = container.findAll("div", {"class":"team-name"})
    name = agent_name[0].text

    phone = container.findAll("span", {"class":"phone"})
    mobile = phone[0].text

    print("name: " + name)
    print("mobile: " + mobile)

Is there a way that I can simply list the different part of the url (woollahra, chinatown, bondibeach), so that the script will loop through each webpage using the code I have already written?

2
  • Make a list of urls and iterate through them and put few seconds of sleep between them Commented Aug 4, 2017 at 0:30
  • I would suggest using lxml as the parser, to improve performance. You can also use SoupStrainer to only parse relevant segments of the source, to further improve performance. Commented Aug 4, 2017 at 0:37

2 Answers 2

2
locations = ['woollahra', 'chinatown', 'bondibeach']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

followed by the rest of your code, that will look over each element of the list, you can add more locations later

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @JoséGarcia for the response, however my code is only printing the last location in the list (bondibeach). I am not sure why or how to fix.
This is not the question you asked, in order for us to see what is going on with your code, provide the working code, because this one doesn't even use the variable my_url. My guess is you found a code snippet on the internet and tried to replace things without looking how it worked, if that is the case, please read the documentation first, if not, please update your question so we could help you solve your problem.
2

You just want a loop

for team in ["woollahra", "chinatown", "bondibeach"]:
    my_url = "https://{}.ljhooker.com.au/our-team".format(team)
    page_soup = soup(page_html, "html.parser")  

    # make sure you indent the rest of the code 

2 Comments

Thanks @cricket_007 for the response, however my code is only printing the last location in the list (bondibeach). I am not sure why or how to fix.
This code is no different than the accepted answer... And a for team in [] will always loop over every team

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.