0

I would like to get the link located in a href attribute from a aelement. The url is: https://www.drivy.com/location-voiture/antwerpen/bmw-serie-1-477429?address=Gare+d%27Anvers-Central&city_display_name=&country_scope=BE&distance=200&end_date=2019-05-20&end_time=18%3A30&latitude=51.2162&longitude=4.4209&start_date=2019-05-20&start_time=06%3A00

I'm searching for the href of this element:

<a class="car_owner_section" href="/users/2643273" rel="nofollow"></a>

When I enter response.css('a.car_owner_section::attr(href)').get() in the terminal I just get nothing but the element exists even when I inspect view(response).

Anybody has a clue about this issue ?

4
  • 1
    Don’t trust view(response), as soon as you open the page in a browser JavaScript code can alter the DOM. Check the actual page sources (Ctrl+U in Firefox, or write response.body to a file). See also stackoverflow.com/q/8550114/939364 Commented May 10, 2019 at 9:48
  • Thanks for the advice. I already know that some data are prompted with js etc... But here I just cannot figuring out where the link is gone. It's not in any json file I've found. Commented May 10, 2019 at 10:32
  • It may be in the page but in a different place, in HTML. Check the sources as suggested. Commented May 10, 2019 at 10:38
  • Already done. I've check the full HTML code. I've maybe missed it but I don't think so. The link is rendered by JS it's certain. Commented May 10, 2019 at 10:51

1 Answer 1

3

The site seems to load on JavaScript, using splash works perfect.

Here is the code:

import scrapy
from scrapy_splash import SplashRequest


class ScrapyOverflow1(scrapy.Spider):
    name = "overflow1"

    def start_requests(self):
        url = 'https://www.drivy.com/location-voiture/antwerpen/bmw-serie-1-477429?address=Gare+d%27Anvers-Central&city_display_name=&country_scope=BE&distance=200&end_date=2019-05-20&end_time=18%3A30&latitude=51.2162&longitude=4.4209&start_date=2019-05-20&start_time=06%3A00'

        yield SplashRequest(url=url, callback=self.parse, args={'wait': 5})

    def parse(self, response):
        links = response.xpath('//a[@class="car_owner_section"]/@href').extract()
        print(links)

To use splash install splash, scrapy splash and run sudo docker run -p 8050:8050 scrapinghub/splash before running the spider. Here is a great article on installing and running splash. article on scrapy spash... and also add midlewares to settings.py (also in the article) expected resultsThe result is as above

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot for the help ! I'm using scrapy with anaconda. Does this installation will be compatible with it ?
Awesome, glad i could help. Yea it is compatible. Alert me if you encounter an issue
While Splash is an option, it will have a significant performance hit on any crawling session. In the long term it’s better to figure out where the content is coming from, and extract the content directly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.