How to scrape video URL from Webpage using python?

Question

I want to download videos from a website.

Here is my code. Every time when i run this code, it returns blank file. Here is live code: https://colab.research.google.com/drive/19NDLYHI2n9rG6KeBCiv9vKXdwb5JL9Nb?usp=sharing

from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")

page = url.content

soup = BeautifulSoup(page, "html.parser")

#print(soup.prettify())

result = soup.find_all('video', class_="video-player")

print(result)

uingtea · Accepted Answer · 2022-03-31 21:49:59Z

1

using Regex

import requests
import re

response = requests.get("....../video/20000kCCF0")
videoId = '20000kCCF0'
videos = re.findall(r'https://[^"]+' + videoId + '[^"]+mp4', response.text)
print(videos)

answered Mar 31, 2022 at 21:49

uingtea

6,6342 gold badges32 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Malsesto Over a year ago

Wow completely forgot about regex at this point. Best idea if we just want the dowload-url. Maybe add a /download to the URL after the video-id to filter the result a bit more.

Malsesto · Accepted Answer · 2022-04-01 06:59:40Z

0

You always get a blank return because soup.find_all() doesn't find anything. Maybe you should check the url.content you receive by hand and then decide what to look for with find_all()

EDIT: After digging a bit around I found out how to get the content_url_orig:

from bs4 import BeautifulSoup
import requests
import json

url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")

page = url.content

soup = BeautifulSoup(page, "html.parser")



result = str(soup.find_all('script')[1]) #looking for script tag inside the html-file
result = result.split('window._state = ')[1].split("</script>']")[0].split('\n')[0] 
#separating the json from the whole script-string, digged around in the file to find out how to do it

result = json.loads(result)


#navigating in the json to get the video-url
entity = list(result['entities'].items())[0][1]
download_url = entity['content_url_orig']

print(download_url)

Funny sidenote: If I read the JSON correctly you can find all videos with download-URLs the creator uploaded :)

edited Apr 1, 2022 at 6:59

answered Mar 30, 2022 at 20:44

Malsesto

5511 bronze badges

5 Comments

Rakesh Kumar Over a year ago

You are right, showing blank means it is not find something. But why does it is not finding the data when data is there.

Malsesto Over a year ago

Digged a bit around in the sourcecode of the website and i believe you have to look for audio and for content-url to get the URL of the actual video to download

Malsesto Over a year ago

Look at this pastebin, you are interested in the url in line 248

Rakesh Kumar Over a year ago

Yes, Line 248 is what I am looking for. But how to print this URL with web scraping?

Malsesto Over a year ago

@RakeshKumar I found a working solution, edited my answer. Would appreciate if you could accept it :)) Ah and btw: digging a lot inside the sourcecode helped a lot

Collectives™ on Stack Overflow

How to scrape video URL from Webpage using python?

2 Answers 2

1 Comment

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related