1

i try to scrape the following site with scrapy and try something with the scrapy shell -

This is the basis spider:

import scrapy

class ZoosSpider(scrapy.Spider):
    name = 'zoos'
    allowed_domains = ['https://www.tripadvisor.co.uk/Attractions-g186216-Activities-c48-a_allAttractions.true-United_Kingdom.html']
    start_urls = ['http://https://www.tripadvisor.co.uk/Attractions-g186216-Activities-c48-a_allAttractions.true-United_Kingdom.html/']

    def parse(self, response):
        tmpSEC = response.xpath("//section[@data-automation='AppPresentation_SingleFlexCardSection']")
        for elem in tmpSEC:
          pass

I get all relevant sections with this xpath: (when i try len(tmpSEC) i get 30 which seems ok for me)

tmpSEC = response.xpath("//section[@data-automation='AppPresentation_SingleFlexCardSection']")

Now i want to extract the very first href-tag and tried it with this xpath: (but with that i only get "/" as result)

>>> tmpSEC[0].xpath("//a/@href").get()  
'/'

and also with

>>> tmpSEC[0].xpath("(//a)[1]/@href").get()  
'/'

but only with an css-selector this is working fine

>>> tmpSEC[0].css("a::attr(href)").get() 
'/Attraction_Review-g186332-d216481-Reviews-Blackpool_Zoo-Blackpool_Lancashire_England.html'

Why is this only working with an css-selector and not with an xpath-selector?

1 Answer 1

2

Here is the working solution using xpath. You need to inject dot(.) like as follows:

import scrapy


class ZoosSpider(scrapy.Spider):
    name = 'zoos'
    
    start_urls = ['https://www.tripadvisor.co.uk/Attractions-g186216-Activities-c48-a_allAttractions.true-United_Kingdom.html/']

    def parse(self, response):
        tmpSEC = response.xpath(
            "//section[@data-automation='AppPresentation_SingleFlexCardSection']")
        #for elem in tmpSEC:
        yield {
            'link':tmpSEC[0].xpath(".//a/@href").get() 
            }   

Output:

{'link': '/Attraction_Review-g186332-d216481-Reviews-Blackpool_Zoo-Blackpool_Lancashire_England.html'}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.