how to scrape multipage website with python and export data into .csv file?

Question

I would like to scrape the following website using python and need to export scraped data into a CSV file:

http://www.swisswine.ch/en/producer?search=&&

This website consist of 154 pages to relevant search. I need to call every pages and want to scrape data but my script couldn't call next pages continuously. It only scrape one page data.

Here I assign value i<153 therefore this script run only for the 154th page and gave me 10 data. I need data from 1st to 154th page

How can I scrape entire data from all page by once I run the script and also how to export data as CSV file??

my script is as follows

import csv
import requests
from bs4 import BeautifulSoup
i = 0
while i < 153:       
     url = ("http://www.swisswine.ch/en/producer?search=&&&page=" + str(i))
     r = requests.get(url)
     i=+1
     r.content

soup = BeautifulSoup(r.content)
print (soup.prettify())


g_data = soup.find_all("ul", {"class": "contact-information"})
for item in g_data:
      print(item.text)

The lines that scrape the data: from soup = .... down, should be inside the loop. Otherwise you finish the loop and are getting the data only of the last one after the loop. — chapelo
– chapelo, Commented Jul 24, 2016 at 14:54
@vishnu It is good to use BeautifulSoup. But if you are looking for whole things to manage well, you should go for doc.scrapy.org/en/latest/intro/tutorial.html — Sijan Bhandari
– Sijan Bhandari, Commented Jul 24, 2016 at 14:58

alecxe · Accepted Answer · 2016-07-24 15:00:02Z

1

You should put your HTML parsing code to under the loop as well. And you are not incrementing the i variable correctly (thanks @MattDMo):

import csv
import requests
from bs4 import BeautifulSoup

i = 0
while i < 153:       
     url = ("http://www.swisswine.ch/en/producer?search=&&&page=" + str(i))
     r = requests.get(url)
     i += 1 

    soup = BeautifulSoup(r.content)
    print (soup.prettify())

    g_data = soup.find_all("ul", {"class": "contact-information"})
    for item in g_data:
          print(item.text)

I would also improve the following:

use requests.Session() to maintain a web-scraping session, which will also bring a performance boost:

if you're making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase

be explicit about an underlying parser for BeautifulSoup:

soup = BeautifulSoup(r.content, "html.parser")  # or "lxml", or "html5lib"

edited Jul 24, 2016 at 15:00

answered Jul 24, 2016 at 14:53

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

MattDMo Over a year ago

You missed one small detail - in the while loop, i is incremented as i =+1. It should be i += 1.

alecxe Over a year ago

@MattDMo ah, I felt something wrong about that but lacking morning coffee. Good catch! Thanks.

Collectives™ on Stack Overflow

how to scrape multipage website with python and export data into .csv file?

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related