4

I want to extract video information(like title, viewer's counts) of a certain Youtube video using python, just as I did web scraping on other websites. But for some reason, either it returns nothing or provides tags only for recommended videos on the side instead of "the main video" of the URL

I tried the same codes that I used for web-scraping on other websites as below. Apparently it doesn't work on Youtube. What should I do if I want to get video information based on a youtube URL?

import requests
from bs4 import BeautifulSoup

base_url ='https://www.youtube.com/watch?'
search_string = 'v=I41aLSzLI50'
url = base_url + search_string
supers=requests.get(url).content    
data = BeautifulSoup(supers,'html.parser')
videos =data.find_all('a', class_= 'content-link spf-link yt-uix-sessionlink spf-link')
for video in videos:
    print(video.find('span', class_='title').get_text())
6
  • 1
    first you should check if page doesn't use JavaScript to add content - BeautifulSoup can't run JavaScript. You could also print content from requests to see what you get. Maybe you get something different then you can get in web browser. It can send Captcha or warning message, etc. Commented Aug 25, 2019 at 21:12
  • 1
    try using the youtube_dl module Commented Aug 25, 2019 at 21:19
  • 1
    Is there a reason why you don't use the youtube api? developers.google.com/youtube/v3 Commented Aug 25, 2019 at 21:36
  • no specific reason, just that I'm that beginner only knowing BeautifulSoup. I guess the reason why I couldn't see HTML content of the main video was the page uses JavaScript. Let me try youtube_dl and youtube api as you guys suggested.Big thanks! Commented Aug 25, 2019 at 22:49
  • but another question is why I couldn't see any from the code just because it's in Javascript? Commented Aug 25, 2019 at 22:57

1 Answer 1

2

I looked up a page on YouTube, and it seems that the you are looking for is not in the original source (at least not where you are expecting it). There are scripts that create the content when your browser renders the page. Based on my experience, you have a few options.

  1. Use one of the APIs the commenters suggested. I am not very familiar with these, but it might same you some time and effort. Web scraping can be problematic because of changes in page format (scripts may need to be updated).

  2. If you insist on web scraping, you can use an automated browser. I used to use Selenium on a regular basis and it should work for your purposes. This will allow you to work with content generated by scripts.

  3. I looked at the page source, and the information you are looking for appears to be contained within some tags, but parsing this will be a pain.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.