Scraping paginated data loaded with Javascript

Question

I am trying to use selenium and beautifulsoup to scrape videos off a website. The videos are loaded when the 'videos' tab is clicked (via JS I guess). When the videos are loaded, there is also the pagination where videos on each page is loaded on click (via JS I guess).

Here is how it looks

When I inspect element, here is what I get

My issue is I can't seem to get all videos across all pages, I can only get the first page. Here is my code,

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup

import random
import time

chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--headless')
seconds = 5 + (random.random() * 5)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.implicitly_wait(30)

driver.get("https://")
time.sleep(seconds)
time.sleep(seconds)

for i in range(1):
    element = driver.find_element_by_id("tab-videos")
    driver.execute_script("arguments[0].click();", element)
    time.sleep(seconds)
    time.sleep(seconds)
html = driver.page_source
page_soup = soup(html, "html.parser")

containers = page_soup.findAll("div", {"id": "tabVideos"})
for videos in containers:
    main_videos = videos.find_all("div", {"class":"thumb-block tbm-init-ok"})
print(main_videos)
driver.quit()

Please what am I missing here?

@radioactive http://x*****s I guess, what are you trying to do? — undetected Selenium
– undetected Selenium, Commented Jul 22, 2020 at 16:25
@radioactive I don't mind, Just make it non-clickable and you can delete the comment afterwards. — Andrej Kesely
– Andrej Kesely, Commented Jul 22, 2020 at 16:25
You must send click to every page button and wait to content loaded for get all your info, it should work.. — Jhoubert Rincon
– Jhoubert Rincon, Commented Jul 22, 2020 at 16:29

Andrej Kesely · Accepted Answer · 2020-07-22 16:39:02Z

3

The content is loaded from URL 'https://www.x***s.com/amateur-channels/ajibola_elizabeth/videos/best/{page}' where page goes from 0.

This script will print all video URLs:

import requests
from bs4 import BeautifulSoup


url = 'https://www.x***s.com/amateur-channels/ajibola_elizabeth/videos/best/{page}'

page = 0
while True:
    soup = BeautifulSoup(requests.get(url.format(page=page)).content, 'html.parser')

    for video in soup.select('div[id^="video_"] .title a'):
        u = video['href'].rsplit('/', maxsplit=2)
        print('https://www.x***s.com/video' + u[-2] + '/' + u[-1])

    next_page = soup.select_one('a.next-page')
    if not next_page:
        break

    page += 1

answered Jul 22, 2020 at 16:39

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

shekwo Over a year ago

Works! How did you get this URL - videos/best/{page}

Andrej Kesely Over a year ago

@radioactive I opened Firefox developer tools -> Network tab and clicked the number link. The URL was then shown. (Chrome has similar developer tools).

shekwo Over a year ago

How about getting the comments on a particular video. The comments is also paginated. Here is an example - x***s.com/video53845169/join_my_onlyfans.com_maami-igbagbo_to_see_more_of_me..click_my_profile_to_get_the_link

Andrej Kesely Over a year ago

@radioactive The correct approach is to make a new question here in SO and put there what you have/what have you tried. I will look at it :)

shekwo Over a year ago

I will do that now.

|

Collectives™ on Stack Overflow

Scraping paginated data loaded with Javascript

1 Answer 1

8 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related