no printing after python script, with no errors

Question

No errors just not printing the results. it is supposed to print out headlines from the url. the script runs but returns nothing whilst parsing for balancedheadlines. i can swap the tag i look for to p and return data but i believe i am no passing through the tags correctly to retrieve just the headlines.

import requests
from bs4 import BeautifulSoup

url = 'http://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')


for ap in soup.find_all('h2', attrs = {"class" : "balancedheader"}):

    if ap.a:
        print(ap.a.text.replace(".n/", " "))
    else:
        print(ap.strip)

There is no h2 tag with the balancedheader class present in the DOM. — heemayl
– heemayl, Commented Feb 24, 2019 at 18:24
soup.find_all('h2', attrs = {"class" : "balancedheader"}) == [] . Because its an empty list your loop that is supposed to run for every item in the list, does nothing. — William Bright
– William Bright, Commented Feb 24, 2019 at 18:25

balderman · Accepted Answer · 2019-02-24 18:40:52Z

1

NY Times web site has no 'h2' element with class named 'balancedheader'.

The xpath '//h2[@class='balancedheader']' return an empty set.

answered Feb 24, 2019 at 18:40

balderman

24k8 gold badges39 silver badges60 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Omer Tekbiyik · Accepted Answer · 2019-02-24 18:50:51Z

You said he script runs but returns nothing whilst parsing for balancedheadlines but trying to search balancedheader . There is no balancedheader in your site.You can see all h2 tags like :

h2_tags = soup.findAll('h2')
for allh2 in h2_tags:
    print allh2

Above code will return all h2 tags in your website.

I think you are trying to get title in balancedHeadline and its in Javascript so you need to use Selenium :

from selenium import webdriver

options = webdriver.ChromeOptions()
driver=webdriver.Chrome(chrome_options=options, executable_path=r'your driver path')
driver.get('https://www.nytimes.com/2019/02/24/world/europe/pope-vatican-sexual-abuse.html')

x = driver.find_elements_by_css_selector("span[class='balancedHeadline']")

for title in x:
    print title.text
driver.close()

OUTPUT:

Pope Francis Ends Landmark Meeting by Calling for ‘All-Out Battle’ to Fight Sexual Abuse

Cryptic Arrow · Accepted Answer · 2019-02-24 18:22:15Z

0

I have fixed you issue, you are not indenting correctly and when you import something make sure the 'i' in "import" is not capital. Fixed Version:

import requests
from bs4 import BeautifulSoup

url = 'http://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')


for ap in soup.find_all('h2', attrs = {"class" : "balancedheader"}):
    if ap.a:
        print(ap.a.text.replace(".n/", " "))
    else:
        print(ap.strip)

answered Feb 24, 2019 at 18:22

Cryptic Arrow

1981 silver badge8 bronze badges

2 Comments

Kevin Smith Over a year ago

capitol was just a typo from transferring to stack. sadly this does not correct the problem.

Cryptic Arrow Over a year ago

what about the indents?

Collectives™ on Stack Overflow

no printing after python script, with no errors

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related