Scrapy
Let's explore how Scrapy is a great python framework to crawl websites.
We'll cover the following...
Overview
Scrapy is a framework that we can use for crawling websites and extracting (i.e. scraping) data. It can also be used to extract data via a website’s API or as a general-purpose web crawler. To install Scrapy, all we need is pip:
According to Scrapy’s documentation, we will also need lxml and OpenSSL
installed.
To install lxml. We will use the command:
To install OpenSSL. We will use the command:
We are going to use Scrapy to do the same thing that we used
BeautifulSoup for, which was scraping the title and link of the articles
on my blog’s front page. To get started, all we need to do is open up a terminal and change directories to the one that we want to store our project in. Then run the following command:
This will create a directory named blog_scraper in the current
directory which will contain the following items:
- Another nested
blog_scraperfolder