1

I'm doing a project where I need to store the date that a video in youtube was published.
The problem is that I'm having some difficulties trying to find this data in the middle of the HTML source code

Here's my code attempt:

import requests
from bs4 import BeautifulSoup as BS

url = "https://www.youtube.com/watch?v=XQgXKtPSzUI&t=915s"
response = requests.get(url)
soup = BS(response.content, "html.parser")
response.close()

dia = soup.find_all('span',{'class':'date'})
print(dia)

Output:

[]

I know that the arguments I'm sending to .find_all() are wrong.
I'm saying this because I was able to store other information from the video using the same code, such as the title and the views.
I've tried different arguments when using .find_all() but didn't figured out how to find it.

3
  • Did you try the YouTube API? Commented Sep 7, 2017 at 16:45
  • I didn't. How's that? I'm quite new on python too... Commented Sep 7, 2017 at 16:47
  • 1
    there's no sample html; you may want to dig through that and make sure your find_all call actually matches the html, that's really the only answer here, otherwise we're writing code for you that's very specific to one scenario. Commented Sep 7, 2017 at 16:47

3 Answers 3

3

If you use Python with pafy, the object you'll get has the published date easily accessible.

Install pafy: "pip install pafy"

import pafy
vid = pafy.new("www.youtube.com/watch?v=2342342whatever")
published_date = vid.published
print(published_date)   #Python3 print statement

Check out the pafy docs for more info: https://pythonhosted.org/Pafy/ The reason I leave the doc link is because it's a really neat module, it handles getting the data without external request modules and also exposes a bunch of other useful properties of the video, like the best format download link, etc.

Sign up to request clarification or add additional context in comments.

Comments

0

It seems that YouTube is using javascript to add the date, so that information is not in the source code. You should try using Selenium to scrape, or get the date from the js since it is directly in the source code.

Comments

0

Try adding attribute as shown below:

dia = soup.find_all('span', attr={'class':'date'})

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.