Scrapy - CSS selector issue

Question

I would like to get the link located in a href attribute from a aelement. The url is: https://www.drivy.com/location-voiture/antwerpen/bmw-serie-1-477429?address=Gare+d%27Anvers-Central&city_display_name=&country_scope=BE&distance=200&end_date=2019-05-20&end_time=18%3A30&latitude=51.2162&longitude=4.4209&start_date=2019-05-20&start_time=06%3A00

I'm searching for the href of this element:

<a class="car_owner_section" href="/users/2643273" rel="nofollow"></a>

When I enter response.css('a.car_owner_section::attr(href)').get() in the terminal I just get nothing but the element exists even when I inspect view(response).

Anybody has a clue about this issue ?

Don’t trust view(response), as soon as you open the page in a browser JavaScript code can alter the DOM. Check the actual page sources (Ctrl+U in Firefox, or write response.body to a file). See also stackoverflow.com/q/8550114/939364 — Gallaecio
– Gallaecio, Commented May 10, 2019 at 9:48
Thanks for the advice. I already know that some data are prompted with js etc... But here I just cannot figuring out where the link is gone. It's not in any json file I've found. — M. Coppee
– M. Coppee, Commented May 10, 2019 at 10:32
It may be in the page but in a different place, in HTML. Check the sources as suggested. — Gallaecio
– Gallaecio, Commented May 10, 2019 at 10:38
Already done. I've check the full HTML code. I've maybe missed it but I don't think so. The link is rendered by JS it's certain. — M. Coppee
– M. Coppee, Commented May 10, 2019 at 10:51

Muhika Thomas · Accepted Answer · 2019-05-10 10:51:29Z

3

The site seems to load on JavaScript, using splash works perfect.

Here is the code:

import scrapy
from scrapy_splash import SplashRequest


class ScrapyOverflow1(scrapy.Spider):
    name = "overflow1"

    def start_requests(self):
        url = 'https://www.drivy.com/location-voiture/antwerpen/bmw-serie-1-477429?address=Gare+d%27Anvers-Central&city_display_name=&country_scope=BE&distance=200&end_date=2019-05-20&end_time=18%3A30&latitude=51.2162&longitude=4.4209&start_date=2019-05-20&start_time=06%3A00'

        yield SplashRequest(url=url, callback=self.parse, args={'wait': 5})

    def parse(self, response):
        links = response.xpath('//a[@class="car_owner_section"]/@href').extract()
        print(links)

To use splash install splash, scrapy splash and run sudo docker run -p 8050:8050 scrapinghub/splash before running the spider. Here is a great article on installing and running splash. article on scrapy spash... and also add midlewares to settings.py (also in the article) The result is as above

answered May 10, 2019 at 10:51

Muhika Thomas

1281 silver badge8 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

M. Coppee Over a year ago

Thanks a lot for the help ! I'm using scrapy with anaconda. Does this installation will be compatible with it ?

Muhika Thomas Over a year ago

Awesome, glad i could help. Yea it is compatible. Alert me if you encounter an issue

Gallaecio Over a year ago

While Splash is an option, it will have a significant performance hit on any crawling session. In the long term it’s better to figure out where the content is coming from, and extract the content directly.

Collectives™ on Stack Overflow

Scrapy - CSS selector issue

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related