0

Hello Everyone, I am beginner and I try to use IF ELSE function with url link in web scraping. I want to select all the pages from de department 64 to 66. My url is : http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0 (with {} = 64 or 65 or 66). My loop works and select all my pages for 64. But when I am inside the 65 I saw I have only one page so my code line last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1] cannot work. here my code :

import requests
from bs4 import BeautifulSoup
url_list = ['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0']
for link in url_list:
    r=requests.get(link)
    soup = BeautifulSoup(r.content, "html.parser")
    page_Url_test=[link.format(i) for i in range(64, 66)]
    for depart_page in page_Url_test:
        depart_page1=str(depart_page)+"?page={}"
        r=requests.get(depart_page1)
        soup = BeautifulSoup(r.content, "html.parser")
        last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
        dept_page_Url=[depart_page1.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)

I tried to incorporate an IF ELSE like this:

for depart_page in page_Url_test:
    depart_page1=str(depart_page)+"?page={}"
    r=requests.get(depart_page1)
    soup = BeautifulSoup(r.content, "html.parser")
    if len(depart_page1) == 0 :
        dept_page_Url=depart_page1
    else:
        last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
        dept_page_Url=[depart_page1.format(i) for i in range(0, int(last_page)+1)]
print(dept_page_Url)

But It doesn't work. How can I say to my code: If I have just one page select just the first one else do my next step? Any clue ? I don't have enough knowledge to find alone... Thank you a lot

1
  • page 65 has no .pagination element, so your condition should be if soup.find(class_='pagination'): Commented Dec 7, 2017 at 17:56

1 Answer 1

1

As sir t.m.adam has already pointed out, you can try like the below approach. I also have trimmed your code to make it concise.

import requests
from bs4 import BeautifulSoup

url_list = 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/{}/0'
for link in [url_list.format(page) for page in range(64,67)]:
    res = requests.get(link)
    soup = BeautifulSoup(res.text,"lxml")
    depart_page = str(link) + "?page={}"
    if soup.find('ul', class_='pagination'):
        last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
        dept_page_Url = [depart_page.format(i) for i in range(0, int(last_page)+1)]
        print(dept_page_Url)

Additional approach when in need:

if soup.find('ul', class_='pagination'):
    last_page = soup.find('ul', class_='pagination').find('li', class_='next').a['href'].split('=')[1]
    dept_page_Url = [depart_page.format(i) for i in range(0, int(last_page)+1)]
    print(dept_page_Url)
else:   
    print(link)

Result:

['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=0', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=1', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/64/0?page=2']
['http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=0', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=1', 'http://www.pour-les-personnes-agees.gouv.fr/annuaire-accueil-de-jour/66/0?page=2']
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks a lot for your explanation and my code more concise. I learned a lot. I have 1question: after this part I need to scrap data and my code works with the department 64 and 66 but not with 65. I know it is because I have just one page but in fact I need to scrap this data also!!! Below my code:
for test in dept_page_Url:``r = requests.get(test)``soup = BeautifulSoup(r.content, "html.parser")``for maison in soup.find_all("div", {"id":"cnsa_results-list"}):``for general in maison.find_all("div", {"class":"row"}):``for description in general.find_all("div", {"class":"cnsa_results-tags2 col col-xs-10 col-sm-10"}):``description_name=description.text``print(description_name)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.