Can not find class using css selector scrapy

Question

I am testing to see if I can scrape a website using scrapy. I get response from the site but I can access the elements or data I want. My selector is right and I dont think there is error in the commands although I am beginner in scrapy. I want to get tags with class results-race-name I runed it through scrapy shell In shell I used th following commands

In [1]: fetch('https://greyhoundbet.racingpost.com/#results-list/r_date=2021-01-01/')

2022-01-07 15:08:58 [scrapy.core.engine] INFO: Spider opened
2022-01-07 15:09:01 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://greyhoundbet.racingpost.com/robots.txt> (referer: None)
2022-01-07 15:09:03 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://greyhoundbet.racingpost.com/#results-list/r_date=2021-01-01/> (referer: None)

In [2]: view(response)
Out[2]: True

In [3]: response.css('.results-race-name').extract()
Out[3]: []

Note the view(response) gives me the output till the loading logo

SuperUser · Accepted Answer · 2022-01-07 21:54:03Z

1

It's not a css problem. The data is created dynamically. You can get it from the json file (open devtools in the browser click on the network tab, look at the json request and get what you need).

In [1]: req = scrapy.Request('https://greyhoundbet.racingpost.com/results/blocks.sd?r_date=2021-01-01&blocks=header%2Cm
   ...: eetings')

In [2]: fetch(req)
[scrapy.core.engine] INFO: Spider opened
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://greyhoundbet.racingpost.com/results/blocks.sd?r_date=2021-01-01&blocks=header%2Cmeetings> (referer: None)

In [3]: json_data = response.json()

In [4]: for data in json_data['meetings']['tracks']['1']['races']:
   ...:     print(data['track'])
   ...:
Newcastle
Swindon
Kinsley

In [5]: for data in json_data['meetings']['tracks']['2']['races']:
   ...:     print(data['track'])
   ...:
Monmore
Crayford
Hove
Harlow
Henlow

EDIT:

spider.py

import scrapy


class ExampleSpider(scrapy.Spider):
    name = "exampleSpider"
    start_urls = ['https://greyhoundbet.racingpost.com/results/blocks.sd?r_date=2021-01-01&blocks=header%2Cmeetings']

    def parse(self, response):
        json_data = response.json()

        for data in json_data['meetings']['tracks']['1']['races']:
            yield {'race': data['track']}

        for data in json_data['meetings']['tracks']['2']['races']:
            yield {'race': data['track']}

Example for spider

main.py:

from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings

if __name__ == "__main__":
    spider = 'exampleSpider'
    settings = get_project_settings()
    settings['USER_AGENT'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'
    process = CrawlerProcess(settings)
    process.crawl(spider)
    process.start()

How to run scrapy from a script

edited Jan 7, 2022 at 21:54

answered Jan 7, 2022 at 15:20

SuperUser

4,8221 gold badge8 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

farhan jatt Over a year ago

thanks i am new to scrapy It helped alot

farhan jatt Over a year ago

can I use this response.css('.results-race-name').extract()

farhan jatt Over a year ago

Please why and how did you modified the url

SuperUser Over a year ago

In this case you can't use the css selector simply because that content is generated with javascript and scrapy don't parse javascript. Here what I did is looking in the browser's devtools in the network tab and watch for the json it uses for the data. This is the url I fetched. Next time if you want to be sure, then turn off javascript in your browser and see if the site loads the information you need or not.

farhan jatt Over a year ago

thanks can you edit to show how to do it in spider.py file I am facing some errors

|

Collectives™ on Stack Overflow

Can not find class using css selector scrapy

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related