Click on load more button using selenium not working properly in python3.7

Question

As I am scraping, the page is dynamic with the 'load more' button. I used selenium for that. The first problem is that it is only working only one time. means clicking load more button the only first time. The second problem is that it is scraping only the articles that are before the first load more button. Not scraping after that. The third problem is that it is scraping all the articles twice. The fourth problem is I only want the date but it is giving along with the date, the author and place also.

import time
import requests
from bs4 import BeautifulSoup
from bs4.element import Tag
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
base = "https://indianexpress.com"
browser = webdriver.Safari(executable_path='/usr/bin/safaridriver')
wait = WebDriverWait(browser, 10)
browser.get('https://indianexpress.com/?s=cybersecurity')

while True:
    try:
        time.sleep(6)
        show_more = wait.until(EC.element_to_be_clickable((By.LINK_TEXT, 'Load More')))
        show_more.click()
    except Exception as e:
            print(e)
            break

soup = BeautifulSoup(browser.page_source,'lxml')
search_results = soup.find('div', {'id':'ie-infinite-scroll'})

links = search_results.find_all('a')
for link in links:
    link_url = link['href']
    response = requests.get(link_url)
    sauce = BeautifulSoup(response.text, 'html.parser')
    dateTag = sauce.find('div', {'class':'m-story-meta__credit'})
    titleTag = sauce.find('h1', {'class':'m-story-header__title'})
    contentTag = ' '.join([item.get_text(strip=True) for item in sauce.select("[class^='o-story-content__main a-wysiwyg'] p")])

    date = None
    title = None
    content = None

    if isinstance(dateTag, Tag):
        date = dateTag.get_text().strip()
    if isinstance(titleTag, Tag):
        title = titleTag.get_text().strip()

    print(f'{date}\n {title}\n {contentTag}\n')
    time.sleep(3)

There is no error in this code. But it needs refinement. What should I do to solve above-mentioned problems?

Thanks.

Batuhan Gürses · Accepted Answer · 2019-06-26 13:28:32Z

1

Because you are not waiting the new content. While the new content is waiting to loading, you are trying to click to the 'load more' button.

Error message:

Message: Element <a class="m-featured-link m-featured-link--centered ie-load-more" href="#"> is not clickable at point (467,417) because another element <div class="o-listing__load-more m-loading"> obscures it

My solution:

while True:
    try:
        wait.until(EC.element_to_be_clickable((By.XPATH, "//a[contains(@class, 'ie-load-more')]")))
        browser.find_element_by_xpath("//a[contains(@class, 'ie-load-more')]").click()
        wait.until(EC.visibility_of_element_located((By.XPATH,"//div[@class='o-listing__load-more']")))
    except Exception as e:
        print(e)
        break

answered Jun 26, 2019 at 13:28

Batuhan Gürses

1241 silver badge9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Piyush Ghasiya Over a year ago

thanks. It worked. I accepted and upvoted this answer.

Collectives™ on Stack Overflow

Click on load more button using selenium not working properly in python3.7

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related