I am trying to use Scrapy on a Reddit page using the CSS Selector to identify an element. What I am seeing is something different than a traditional CSS selector, it is a randomly mix of numbers/letters as a method of an h3 element. I suspect its something to do with JS react. What I ultimately am trying to do is pull out the text of a title.
Here is what the Firefox WebDev tools inspector returns as the CSS Path for a title:
html body div#2x-container div._1VP69d9lk-Wk9zokOaylL div div#SHORTCUT_FOCUSABLE_DIV._1gsAk1ihQliBnDybgyjghy div.SubredditVars-r-gameofthrones div._1nxEQl5D2Bx2jxDILRHemb div div.f3kbjo-0.jsptSt div.f3kbjo-1.gHIJbE div._1vyLCp-v-tE5QvZovwrASa div.u1x7p5-0.iYtPfj div.rpBJOHq2PR60pnwJlUyP0 div div div#t3_cmy9rm._1oQyIsiPHYt6nx7VOmd1sz._1RYN-7H8gYctjOQeL8p2Q7.scrollerItem._3Qkp11fjcAw9I9wtLo8frE._1qftyZQ2bhqP62lbPjoGAh.Post.t3_cmy9rm div._1poyrkZ7g36PawDueRza-J._11R7M_VOgKO1RJyRSRErT3 div._2FCtq-QzlfuN-SwVMUZMM3._3wiKjmhpIpoTE2r5KCm2o6.t3_cmy9rm div.y8HYJ-y_lTUHkQIc1mdCq._2INHSNB8V5eaWp4P0rY_mE a.SQnoC3ObvgnGjWt90zD9Z._2INHSNB8V5eaWp4P0rY_mE div._2SdHzo12ISmrC8H86TgSCp._3wqmjmv3tb_k-PROt7qFZe h3._eYtD2XCVieq6emjKBH3m
With what is the CSS selector of the element I am trying to identify:
h3._eYtD2XCVieq6emjKBH3m
For reference, on stackoverflow If I were to get the body text element, the CSS selector I would identify would be:
div.post-text
Can anyone shed any light on what Reddit is doing with the CSS selectors and why they are different. And proper form for returning the element?
Thanks.
P.S. this is for reference what I want to do, but I'd like a more in-depth under-the-hood explanation of what Reddit is doing. Also, if it's not React, please enlighten me to what it is.
wwwwitholdto view the old reddit. Instead ofhttps://www.reddit.com/r/gameofthrones/scrapehttps://old.reddit.com/r/gameofthrones/. btw I am pretty sure Reddit provides an API or something like that which you can use in python to get data in structured format instead of relying on scraping.