0

I am trying to extract the phone titles (and eventually other data) from multiple web pages using scrapy. I am trying to do this with defined functions. the "parse" function is supposed to pull all of the page links, which it does do correctly if I let it yield its results to a CSV. However when I try to set up a second "parse_pages" it seems that the code won't even try to process and i cannot get a CSV output of just the titles for each page

note: i recognize the indenting is wrong below for the functions,

import scrapy
from scrapy.http import Request

url = 'https://www.gsmarena.com/'

class PhonelinksSpider(scrapy.Spider):
    name = 'phonelinks'
    allowed_domains = ['www.gsmarena.com/results.php3?']
    start_urls = ['https://www.gsmarena.com/results.php3?']

    def parse(self, response):
        links = response.xpath('//div[@class="makers"]/ul/li/a/@href').extract()
        for link in links:
            location = url+link
            yield response.follow(url = location,callback = self.parse_pages)



    def parse_pages(self, response):
       phones = response.xpath('//h1[contains(@class,"specs-phone-name-title")]/text()').extract_first().strip()
       for title in phones:
           phone_list = {'phone':title}
           yield phone_list
1

1 Answer 1

1

Here

phones = response.xpath('//h1[contains(@class,"specs-phone-name-title")]/text()').extract_first().strip()

extract_first() returns a string or None that's why you can iterate it on next line.

def parse_pages(self, response):
   title = response.xpath('//h1[contains(@class,"specs-phone-name-title")]/text()').extract_first().strip()
   yield {'phone':title}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.