0

No errors just not printing the results. it is supposed to print out headlines from the url. the script runs but returns nothing whilst parsing for balancedheadlines. i can swap the tag i look for to p and return data but i believe i am no passing through the tags correctly to retrieve just the headlines.

import requests
from bs4 import BeautifulSoup

url = 'http://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')


for ap in soup.find_all('h2', attrs = {"class" : "balancedheader"}):

    if ap.a:
        print(ap.a.text.replace(".n/", " "))
    else:
        print(ap.strip)
2
  • 4
    There is no h2 tag with the balancedheader class present in the DOM. Commented Feb 24, 2019 at 18:24
  • 1
    soup.find_all('h2', attrs = {"class" : "balancedheader"}) == [] . Because its an empty list your loop that is supposed to run for every item in the list, does nothing. Commented Feb 24, 2019 at 18:25

3 Answers 3

1

NY Times web site has no 'h2' element with class named 'balancedheader'.

The xpath '//h2[@class='balancedheader']' return an empty set.

Sign up to request clarification or add additional context in comments.

Comments

0

You said he script runs but returns nothing whilst parsing for balancedheadlines but trying to search balancedheader . There is no balancedheader in your site.You can see all h2 tags like :

h2_tags = soup.findAll('h2')
for allh2 in h2_tags:
    print allh2

Above code will return all h2 tags in your website.

I think you are trying to get title in balancedHeadline and its in Javascript so you need to use Selenium :

from selenium import webdriver

options = webdriver.ChromeOptions()
driver=webdriver.Chrome(chrome_options=options, executable_path=r'your driver path')
driver.get('https://www.nytimes.com/2019/02/24/world/europe/pope-vatican-sexual-abuse.html')

x = driver.find_elements_by_css_selector("span[class='balancedHeadline']")

for title in x:
    print title.text
driver.close()

OUTPUT:

Pope Francis Ends Landmark Meeting by Calling for ‘All-Out Battle’ to Fight Sexual Abuse

Comments

0

I have fixed you issue, you are not indenting correctly and when you import something make sure the 'i' in "import" is not capital. Fixed Version:

import requests
from bs4 import BeautifulSoup

url = 'http://www.nytimes.com'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'lxml')


for ap in soup.find_all('h2', attrs = {"class" : "balancedheader"}):
    if ap.a:
        print(ap.a.text.replace(".n/", " "))
    else:
        print(ap.strip)

2 Comments

capitol was just a typo from transferring to stack. sadly this does not correct the problem.
what about the indents?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.