I am new in python and would like to learn web scraping with python. My first project are the yellow pages in Germany.
When executing my code, I am getting following IndexError after scraping 12 pages:
('Traceback (most recent call last): File "C:/Users/Zorro/PycharmProjects/scraping/venv/Lib/site-packages/pip-19.0.3-py3.6.egg/pip/_vendor/pytoml/test.py", line 25, in city = city_container[0].text.strip() IndexError: list index out of range
Process finished with exit code 1')
I would like to know how I can skip this error, so that python does not stop scraping.
I tried to use try and except blocks, but did not succeed.
from bs4 import BeautifulSoup as soup
import requests
page_title = "/Seite-"
page_number = 1
for i in range(25):
my_url = "https://www.gelbeseiten.de/Branchen/Italienisches%20Restaurant/Berlin"
page_html = requests.get(my_url + page_title + str(page_number))
page_soup = soup(page_html.text, "html.parser")
containers = page_soup.findAll("div", {"class": "table"})
for container in containers:
name_container = container.findAll("div", {"class": "h2"})
name = name_container[0].text.strip()
street_container = container.findAll("span", {"itemprop": "streetAddress"})
street = street_container[0].text.strip()
city_container = container.findAll("span", {"itemprop": "addressLocality"})
city = city_container[0].text.strip()
plz_container = container.findAll("span", {"itemprop": "postalCode"})
plz_name = plz_container[0].text.strip()
tele_container = container.findAll("li", {"class": "phone"})
tele = tele_container[0].text.strip()
print(name, "\n" + street, "\n" + plz_name + " " + city, "\n" + tele)
print()
page_number += 1