How to avoid errors when just one page URL- Python - Web scraping

Question

Hello Everyone, I am beginner and I try to use IF ELSE function with url link in web scraping. I want to select all the pages from de department 64 to 66. My url is : http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0 (with {} = 64 or 65 or 66). My loop works and select all my pages for 64. But when I am inside the 65 I saw I have only one page so my code line last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1] cannot work. here my code :

import requests
from bs4 import BeautifulSoup
url_list = ['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0']
for link in url_list:
    r=requests.get(link)
    soup = BeautifulSoup(r.content, "html.parser")
    page_Url_test=[link.format(i) for i in range(64, 66)]
    for depart_page in page_Url_test:
        depart_page1=str(depart_page)+"?page={}"
        r=requests.get(depart_page1)
        soup = BeautifulSoup(r.content, "html.parser")
        last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
        dept_page_Url=[depart_page1.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)

I tried to incorporate an IF ELSE like this:

for depart_page in page_Url_test:
    depart_page1=str(depart_page)+"?page={}"
    r=requests.get(depart_page1)
    soup = BeautifulSoup(r.content, "html.parser")
    if len(depart_page1) == 0 :
        dept_page_Url=depart_page1
    else:
        last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
        dept_page_Url=[depart_page1.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)

But It doesn't work. How can I say to my code: If I have just one page select just the first one else do my next step? Any clue ? I don't have enough knowledge to find alone... Thank you a lot

page 65 has no .pagination element, so your condition should be if soup.find(class_='pagination'): — t.m.adam
– t.m.adam, Commented Dec 7, 2017 at 17:56

SIM · Accepted Answer · 2017-12-10 17:26:25Z

1

As sir t.m.adam has already pointed out, you can try like the below approach. I also have trimmed your code to make it concise.

import requests
from bs4 import BeautifulSoup

url_list = 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0'
for link in [url_list.format(page) for page in range(64,67)]:
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    depart_page = str(link) + "?page={}"
    if soup.find('ul', class_='pagination'):
        last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
        dept_page_Url = [depart_page.format(i) for i in range(0, int(last_page)+1)]
        print(dept_page_Url)

Additional approach when in need:

if soup.find('ul', class_='pagination'):
    last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
    dept_page_Url = [depart_page.format(i) for i in range(0, int(last_page)+1)]
    print(dept_page_Url)
else:   
    print(link)

Result:

['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=0', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=1', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=2']
['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=0', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=1', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=2']

edited Dec 10, 2017 at 17:26

answered Dec 7, 2017 at 18:37

SIM

22.5k6 gold badges45 silver badges116 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Marie Ducourau Over a year ago

Thanks a lot for your explanation and my code more concise. I learned a lot. I have 1question: after this part I need to scrap data and my code works with the department 64 and 66 but not with 65. I know it is because I have just one page but in fact I need to scrap this data also!!! Below my code:

Marie Ducourau Over a year ago

for test in dept_page_Url:``r = requests.get(test)``soup = BeautifulSoup(r.content, "html.parser")``for maison in soup.find_all("div", {"id":"cnsa_results-list"}):``for general in maison.find_all("div", {"class":"row"}):``for description in general.find_all("div", {"class":"cnsa_results-tags2 col col-xs-10 col-sm-10"}):``description_name=description.text``print(description_name)

Collectives™ on Stack Overflow

How to avoid errors when just one page URL- Python - Web scraping

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related