I am trying to extract the phone titles (and eventually other data) from multiple web pages using scrapy. I am trying to do this with defined functions. the "parse" function is supposed to pull all of the page links, which it does do correctly if I let it yield its results to a CSV. However when I try to set up a second "parse_pages" it seems that the code won't even try to process and i cannot get a CSV output of just the titles for each page
note: i recognize the indenting is wrong below for the functions,
import scrapy
from scrapy.http import Request
url = 'https://www.gsmarena.com/'
class PhonelinksSpider(scrapy.Spider):
name = 'phonelinks'
allowed_domains = ['www.gsmarena.com/results.php3?']
start_urls = ['https://www.gsmarena.com/results.php3?']
def parse(self, response):
links = response.xpath('//div[@class="makers"]/ul/li/a/@href').extract()
for link in links:
location = url+link
yield response.follow(url = location,callback = self.parse_pages)
def parse_pages(self, response):
phones = response.xpath('//h1[contains(@class,"specs-phone-name-title")]/text()').extract_first().strip()
for title in phones:
phone_list = {'phone':title}
yield phone_list