0

I'm trying to scrape data from RateMyProfessor, but since it's a react app and everything on teacher information is dynamically created, that means that requests.get() doesn't get the data I'm trying to parse. But I found that the data is in a script tag that does get parsed from requests.get. I wanted to know how I can retrieve information from

<script> window.__RELAY_STORE__ = {"legacyId":774048,"avgRating":2.6,"numRatings":12} </script>

There's more stuff in the relay store, but this is exactly what I'm trying to parse. Also wanted to add that there are multiple script tags.

I'm currently using Selenium to render the whole page, but it takes a really long time, so is there a way to access this window relay store so that I won't need to render the site each time?

For anyone curious this is what I wrote to get the window relay store

import requests

page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
print(page.content)

1 Answer 1

1

From inspecting the page, you will notice script is within body. Just extract the script within the body as shown in the code.

import requests
from bs4 import BeautifulSoup
import re

page = requests.get("https://www.ratemyprofessors.com/search/teachers?query=Michael&sid=U2Nob29sLTM5OQ==")
soup = BeautifulSoup(page.text, 'html')
#extract the part you want here
script = soup.find("body").find("script")
#here I'm using regex to just pre process the string 
for items in re.findall(r"(\[.*\])", script.string):
    print(items)

Output gives you: enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.