0

I want to download videos from a website.

Here is my code. Every time when i run this code, it returns blank file. Here is live code: https://colab.research.google.com/drive/19NDLYHI2n9rG6KeBCiv9vKXdwb5JL9Nb?usp=sharing

from bs4 import BeautifulSoup
import requests

url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")

page = url.content

soup = BeautifulSoup(page, "html.parser")

#print(soup.prettify())

result = soup.find_all('video', class_="video-player")

print(result)

2 Answers 2

1

using Regex

import requests
import re

response = requests.get("....../video/20000kCCF0")
videoId = '20000kCCF0'
videos = re.findall(r'https://[^"]+' + videoId + '[^"]+mp4', response.text)
print(videos)
Sign up to request clarification or add additional context in comments.

1 Comment

Wow completely forgot about regex at this point. Best idea if we just want the dowload-url. Maybe add a /download to the URL after the video-id to filter the result a bit more.
0

You always get a blank return because soup.find_all() doesn't find anything. Maybe you should check the url.content you receive by hand and then decide what to look for with find_all()

EDIT: After digging a bit around I found out how to get the content_url_orig:

from bs4 import BeautifulSoup
import requests
import json

url = requests.get("https://www.mxtakatak.com/xt0.3a7ed6f84ded3c0f678638602b48bb1b840bea7edb3700d62cebcf7a400d4279/video/20000kCCF0")

page = url.content

soup = BeautifulSoup(page, "html.parser")



result = str(soup.find_all('script')[1]) #looking for script tag inside the html-file
result = result.split('window._state = ')[1].split("</script>']")[0].split('\n')[0] 
#separating the json from the whole script-string, digged around in the file to find out how to do it

result = json.loads(result)


#navigating in the json to get the video-url
entity = list(result['entities'].items())[0][1]
download_url = entity['content_url_orig']

print(download_url)

Funny sidenote: If I read the JSON correctly you can find all videos with download-URLs the creator uploaded :)

5 Comments

You are right, showing blank means it is not find something. But why does it is not finding the data when data is there.
Digged a bit around in the sourcecode of the website and i believe you have to look for audio and for content-url to get the URL of the actual video to download
Look at this pastebin, you are interested in the url in line 248
Yes, Line 248 is what I am looking for. But how to print this URL with web scraping?
@RakeshKumar I found a working solution, edited my answer. Would appreciate if you could accept it :)) Ah and btw: digging a lot inside the sourcecode helped a lot

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.