0

I'm trying to scrape a random site and get all the text with a certain class off of a page.

from bs4 import BeautifulSoup
import requests
sources = ['https://cnn.com']

for source in sources:
    page = requests.get(source)

    soup = BeautifulSoup(page.content, 'html.parser')
    results = soup.find_all("div", class_='cd_content')
    for result in results:
        title = result.find('span', class_="cd__headline-text vid-left-enabled")
        print(title)

From what I found online, this should work but for some reason, it can't find anything and results is empty. Any help is greatly appreciated.

2
  • There are no occurrences of either of those classes in the page as downloaded. The content is apparently being built with Javascript. You would have to use Selenium to get a Javascript interpreter involved. Commented May 10, 2021 at 21:46
  • Ah, I see, how would I do that? Commented May 10, 2021 at 21:48

1 Answer 1

2

Upon inspecting the network calls, you see that the page is loaded dynamically via sending a GET request to:

https://www.cnn.com/data/ocs/section/index.html:homepage1-zone-1/views/zones/common/zone-manager.izl

The HTML is available within the html key on the page

import requests
from bs4 import BeautifulSoup


URL = "https://www.cnn.com/data/ocs/section/index.html:homepage1-zone-1/views/zones/common/zone-manager.izl"
response = requests.get(URL).json()["html"]
soup = BeautifulSoup(response, "html.parser")

for tag in soup.find_all(class_="cd__headline-text vid-left-enabled"):
    print(tag.text)

Output (truncated):

This is the first Covid-19 vaccine in the US authorized for use in younger teens and adolescents
When the US could see Covid cases and deaths plummet 
'Truly, madly, deeply false': Keilar fact-checks Ron Johnson's vaccine claim
These are the states with the highest and lowest vaccination rates
Sign up to request clarification or add additional context in comments.

6 Comments

How would I find this link for other sites?
Every site is different, usually VASTLY different. There are no rules that apply everywhere. You have to go look at the text of the page by hand to find this out.
I assume it would be in the dev tools, I understand that I would have to search by hand, but what exactly am I searching for?
@Kookies In the dev tools, you can search for text, so search for text on the website and see if it's loaded dynamically
@Kookies Checkout this answer for more in-depth on the dev tools
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.