0

I don't understand why the below doesn't work. I know there are related answers, but they didn't help me.

$ scrapy shell "http://edition.cnn.com"

There is an h2 tag with "CNN Money" as text inside. Why doesn't the below work?

>>> response.xpath('//h2[contains(string(), "CNN Money")]')
[]

I also tried text()

>>> response.xpath('//h2[contains(text(), "CNN Money")]')
[] 

1 Answer 1

2

It's not about XPath expression you use. The problem is that the page content is supplied dynamically e.g. by some JavaScript. Check yourself -- try to search for CNN Money in the page source code. You won't find any hit. You need to render the page and parse the output. I suggest you use Splash together with scrapy-splash library for that purpose.

EDIT:

Run Splash using this command:

docker run -d -p 8050:8050 --restart=always scrapinghub/splash --max-timeout 3600

It increases the maximum timeout for requests. (You can look at documentation about other options how to run Splash in production.) You also need to increase the timeout field in args parameter to SplashRequest, e.g.

yield scrapy_splash.SplashRequest(url, self.parse, endpoint='render.json', args={'timeout': 3600})
Sign up to request clarification or add additional context in comments.

3 Comments

I managed to get Splash working, thanks! Had to use the IP address shown when I start docker instead of the IP address in the tutorial. I need to learn more about Splash now, because the rendered HTML Splash gives me looks like a total mess and I don't understand how I can find the items I want to scrape.
@Andras It might look like a mess, because Splash doesn't fetch all stuff by default, but you should be able to get all the elements the same way as you see them when inspecting the page via other browsers' tools (Firefox, Chrome).
Thanks! Now I understand! This video was also helpful: Scraping JavaScript pages with Scrapy and Splash

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.