Identifying JS React CSS Selector using Scrapy

Ask Question

Asked 6 years, 3 months ago

Modified 6 years, 3 months ago

Viewed 220 times

I am trying to use Scrapy on a Reddit page using the CSS Selector to identify an element. What I am seeing is something different than a traditional CSS selector, it is a randomly mix of numbers/letters as a method of an h3 element. I suspect its something to do with JS react. What I ultimately am trying to do is pull out the text of a title.

Here is what the Firefox WebDev tools inspector returns as the CSS Path for a title:

html body div#2x-container div._1VP69d9lk-Wk9zokOaylL div div#SHORTCUT_FOCUSABLE_DIV._1gsAk1ihQliBnDybgyjghy div.SubredditVars-r-gameofthrones div._1nxEQl5D2Bx2jxDILRHemb div div.f3kbjo-0.jsptSt div.f3kbjo-1.gHIJbE div._1vyLCp-v-tE5QvZovwrASa div.u1x7p5-0.iYtPfj div.rpBJOHq2PR60pnwJlUyP0 div div div#t3_cmy9rm._1oQyIsiPHYt6nx7VOmd1sz._1RYN-7H8gYctjOQeL8p2Q7.scrollerItem._3Qkp11fjcAw9I9wtLo8frE._1qftyZQ2bhqP62lbPjoGAh.Post.t3_cmy9rm div._1poyrkZ7g36PawDueRza-J._11R7M_VOgKO1RJyRSRErT3 div._2FCtq-QzlfuN-SwVMUZMM3._3wiKjmhpIpoTE2r5KCm2o6.t3_cmy9rm div.y8HYJ-y_lTUHkQIc1mdCq._2INHSNB8V5eaWp4P0rY_mE a.SQnoC3ObvgnGjWt90zD9Z._2INHSNB8V5eaWp4P0rY_mE div._2SdHzo12ISmrC8H86TgSCp._3wqmjmv3tb_k-PROt7qFZe h3._eYtD2XCVieq6emjKBH3m

With what is the CSS selector of the element I am trying to identify:

h3._eYtD2XCVieq6emjKBH3m

For reference, on stackoverflow If I were to get the body text element, the CSS selector I would identify would be:

div.post-text

Can anyone shed any light on what Reddit is doing with the CSS selectors and why they are different. And proper form for returning the element?

Thanks.

P.S. this is for reference what I want to do, but I'd like a more in-depth under-the-hood explanation of what Reddit is doing. Also, if it's not React, please enlighten me to what it is.

asked Aug 8, 2019 at 7:28

user6510891

Reddit frontend is written in react.If you use the reactive dev tools you will see the blue icon next to the url bar

Manos Kounelakis
– Manos Kounelakis

2019-08-08 07:30:08 +00:00
Commented Aug 8, 2019 at 7:30
Don't use css selectors on React websites, classNames will change whenever website is updated. Use old.reddit.com to scrape data, no react, so you can just read the html to get data. Right now I think you are using splash or something the render React. New reddit's React code is most likely using jss so random classNames are generated dynamically.

Vaibhav Vishal
– Vaibhav Vishal

2019-08-08 08:41:24 +00:00
Commented Aug 8, 2019 at 8:41
On any reddit link replace www with old to view the old reddit. Instead of https://www.reddit.com/r/gameofthrones/ scrape https://old.reddit.com/r/gameofthrones/. btw I am pretty sure Reddit provides an API or something like that which you can use in python to get data in structured format instead of relying on scraping.

Vaibhav Vishal
– Vaibhav Vishal

2019-08-08 08:44:55 +00:00
Commented Aug 8, 2019 at 8:44
See docs.scrapy.org/en/latest/topics/dynamic-content.html

Gallaecio
– Gallaecio

2019-08-08 09:04:58 +00:00
Commented Aug 8, 2019 at 9:04

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Collectives™ on Stack Overflow

Identifying JS React CSS Selector using Scrapy

0

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest

Linked