Web scrape with Python - Issue with Looping through multiple web pages

Question

I am attempting to loop through multiple real estate agent websites, scraping the agents name and mobile number.

My code:

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")

containers = page_soup.findAll("div", {"class":"team-details"})

for container in containers:
    agent_name = container.findAll("div", {"class":"team-name"})
    name = agent_name[0].text

    phone = container.findAll("span", {"class":"phone"})
    mobile = phone[0].text

    print("name: " + name)
    print("mobile: " + mobile)

However when I run my script, it skips the first three webpages (woollahra, chinatown, bondibeach) and only scraping the info from the last website in the list (doublebay). I am unsure why it is doing this or how make it loop through all webpages.

make sure to add the modules you are using, please add the import statements — José Garcia
– José Garcia, Commented Aug 7, 2017 at 1:14
I think you're missing a mental model of what your programs are doing. Go through each line in your head. What does the first for-loop do? What's the state of my_url at the end? How do you expect it to repeat the code below for all instances of my_url? — Michel Müller
– Michel Müller, Commented Aug 7, 2017 at 1:15

Vinícius Figueiredo · Accepted Answer · 2017-08-07 01:16:25Z

1

You should have all the code inside your first loop, otherwise the loop will do no more than changing the variable my_url. So all you have to do is to indent the rest of your code:

locations = ['woollahra', 'chinatown', 'bondibeach','doublebay']
for location in locations:
    my_url = 'https://' + location + '.ljhooker.com.au/our-team'

    uClient = uReq(my_url)
    page_html = uClient.read()
    uClient.close()

    page_soup = soup(page_html, "html.parser")

    containers = page_soup.findAll("div", {"class":"team-details"})

    for container in containers:
        agent_name = container.findAll("div", {"class":"team-name"})
        name = agent_name[0].text

        phone = container.findAll("span", {"class":"phone"})
        mobile = phone[0].text

        print("name: " + name)
        print("mobile: " + mobile)

answered Aug 7, 2017 at 1:16

Vinícius Figueiredo

6,5234 gold badges30 silver badges46 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Web scrape with Python - Issue with Looping through multiple web pages

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related