3

I am trying to use selenium and beautifulsoup to scrape videos off a website. The videos are loaded when the 'videos' tab is clicked (via JS I guess). When the videos are loaded, there is also the pagination where videos on each page is loaded on click (via JS I guess).

Here is how it looks

enter image description here

When I inspect element, here is what I get

enter image description here

My issue is I can't seem to get all videos across all pages, I can only get the first page. Here is my code,

from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as soup

import random
import time

chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument('--headless')
seconds = 5 + (random.random() * 5)
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.implicitly_wait(30)

driver.get("https://")
time.sleep(seconds)
time.sleep(seconds)

for i in range(1):
    element = driver.find_element_by_id("tab-videos")
    driver.execute_script("arguments[0].click();", element)
    time.sleep(seconds)
    time.sleep(seconds)
html = driver.page_source
page_soup = soup(html, "html.parser")

containers = page_soup.findAll("div", {"id": "tabVideos"})
for videos in containers:
    main_videos = videos.find_all("div", {"class":"thumb-block tbm-init-ok"})
print(main_videos)
driver.quit()

Please what am I missing here?

11
  • 1
    Can you share the URL? Commented Jul 22, 2020 at 16:16
  • @AndrejKesely Its adult site. Do you mind? Commented Jul 22, 2020 at 16:21
  • 1
    @radioactive http://x*****s I guess, what are you trying to do? Commented Jul 22, 2020 at 16:25
  • @radioactive I don't mind, Just make it non-clickable and you can delete the comment afterwards. Commented Jul 22, 2020 at 16:25
  • 1
    You must send click to every page button and wait to content loaded for get all your info, it should work.. Commented Jul 22, 2020 at 16:29

1 Answer 1

3

The content is loaded from URL 'https://www.x***s.com/amateur-channels/ajibola_elizabeth/videos/best/{page}' where page goes from 0.

This script will print all video URLs:

import requests
from bs4 import BeautifulSoup


url = 'https://www.x***s.com/amateur-channels/ajibola_elizabeth/videos/best/{page}'

page = 0
while True:
    soup = BeautifulSoup(requests.get(url.format(page=page)).content, 'html.parser')

    for video in soup.select('div[id^="video_"] .title a'):
        u = video['href'].rsplit('/', maxsplit=2)
        print('https://www.x***s.com/video' + u[-2] + '/' + u[-1])

    next_page = soup.select_one('a.next-page')
    if not next_page:
        break

    page += 1
Sign up to request clarification or add additional context in comments.

8 Comments

Works! How did you get this URL - videos/best/{page}
@radioactive I opened Firefox developer tools -> Network tab and clicked the number link. The URL was then shown. (Chrome has similar developer tools).
How about getting the comments on a particular video. The comments is also paginated. Here is an example - x***s.com/video53845169/join_my_onlyfans.com_maami-igbagbo_to_see_more_of_me..click_my_profile_to_get_the_link
@radioactive The correct approach is to make a new question here in SO and put there what you have/what have you tried. I will look at it :)
I will do that now.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.