0

I am attempting to loop through multiple real estate agent websites, scraping the agents name and mobile number.

My code:

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div", {"class":"team-details"})

for container in containers:
    agent_name = container.findAll("div", {"class":"team-name"})
    name = agent_name[0].text

    phone = container.findAll("span", {"class":"phone"})
    mobile = phone[0].text

    print("name: " + name)
    print("mobile: " + mobile)

However when I run my script, it skips the first three webpages (woollahra, chinatown, bondibeach) and only scraping the info from the last website in the list (doublebay). I am unsure why it is doing this or how make it loop through all webpages.

2
  • make sure to add the modules you are using, please add the import statements Commented Aug 7, 2017 at 1:14
  • I think you're missing a mental model of what your programs are doing. Go through each line in your head. What does the first for-loop do? What's the state of my_url at the end? How do you expect it to repeat the code below for all instances of my_url? Commented Aug 7, 2017 at 1:15

1 Answer 1

1

You should have all the code inside your first loop, otherwise the loop will do no more than changing the variable my_url. So all you have to do is to indent the rest of your code:

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

    uClient = uReq(my_url)
    page_html = uClient.read()
    uClient.close()

    page_soup = soup(page_html, "html.parser")

    containers = page_soup.findAll("div", {"class":"team-details"})

    for container in containers:
        agent_name = container.findAll("div", {"class":"team-name"})
        name = agent_name[0].text

        phone = container.findAll("span", {"class":"phone"})
        mobile = phone[0].text

        print("name: " + name)
        print("mobile: " + mobile)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.