web scraping with beautiful soup in python

Question

I want to crawl the homepage of youtube to pull out all the links of videos. Following is the code

from bs4 import BeautifulSoup
import requests

s='https://www.youtube.com/'
html=requests.get(s)
html=html.text

s=BeautifulSoup(html,features="html.parser")

for e in s.find_all('a',{'id':'video-title'}):
    link=e.get('href')
    text=e.string
    print(text)
    print(link)
    print()

Nothing is happenning when I run the above code. It seems like the id is not getting discovered. What am I doing wrong

teoman · Accepted Answer · 2018-07-30 12:21:02Z

1

It is because you are not getting the same HTML as your browser have.

import requests
from bs4 import BeautifulSoup


s =  requests.get("https://youtube.com").text

soup = BeautifulSoup(s,'lxml')

print(soup)

Save this code's output to a file named test.html and run. You will see that it is not the same as the browser's, as it looks corrupted.

See these questions below.

HTML in browser doesn't correspond to scraped data in python

Python requests not giving me the same HTML as my browser is

Basically, I recommend you to use Selenium Webdriver as it reacts as a browser.

answered Jul 30, 2018 at 12:21

teoman

9692 gold badges10 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Steven M · Accepted Answer · 2018-07-30 12:36:23Z

1

Yes, this is a strange scrape, but if you scrape at the 'div id="content"' level, you are able to get the data you are requesting. I was able to get the titles of each video, but it appears youtube has some rate limiting or throttling, so I do not think you will be able to get ALL of the titles and links. At any rate, below is what I got working for the titles:

import requests
from bs4 import BeautifulSoup

url = 'https://www.youtube.com/'
response = requests.get(url)
page = response.text
soup = BeautifulSoup(page, 'html.parser')
links = soup.find_all('div', id='content')

for each in links:
    print(each.text)

answered Jul 30, 2018 at 12:36

Steven M

2041 gold badge4 silver badges13 bronze badges

1 Comment

Nikhil Rathore Over a year ago

Don't we have to give the id in the form of dictionary like this {'id':'content'}

utks009 · Accepted Answer · 2018-07-30 13:26:06Z

1

May be this could help for scraping all videos from youtube home page,

    from bs4 import BeautifulSoup
    import requests

    r = 'https://www.youtube.com/'
    html = requests.get(r)

    all_videos = []

    soup = BeautifulSoup(html.text, 'html.parser')
    for i in soup.find_all('a'):
        if i.has_attr('href'):
            text = i.attrs.get('href')
            if text.startswith('/watch?'):
                urls = r+text
                all_videos.append(urls)
    print('Total Videos', len(all_videos))
    print('LIST OF VIDEOS', all_videos)

answered Jul 30, 2018 at 13:26

utks009

5734 silver badges14 bronze badges

Comments

Andrej Kesely · Accepted Answer · 2018-07-30 13:33:39Z

This code snippet will selects all links from youtube.com homepage that contains /watch? in their href attribute (links to videos):

from bs4 import BeautifulSoup
import requests

soup = BeautifulSoup(requests.get('https://www.youtube.com/').text, 'lxml')

for a in soup.select('a[href*="/watch?"]'):
    print('https://www.youtube.com{}'.format(a['href']))

Prints:

https://www.youtube.com/watch?v=pBhkG2Zwf-c
https://www.youtube.com/watch?v=pBhkG2Zwf-c
https://www.youtube.com/watch?v=gnn7GwqXek4
https://www.youtube.com/watch?v=gnn7GwqXek4
https://www.youtube.com/watch?v=AMKDVfucPfA
https://www.youtube.com/watch?v=AMKDVfucPfA
https://www.youtube.com/watch?v=daQcqPHx9uw
https://www.youtube.com/watch?v=daQcqPHx9uw
https://www.youtube.com/watch?v=V_MXGdSBbAI
https://www.youtube.com/watch?v=V_MXGdSBbAI
https://www.youtube.com/watch?v=KEW9U7s_zks
https://www.youtube.com/watch?v=KEW9U7s_zks
https://www.youtube.com/watch?v=EM7ZR5z3kCo
https://www.youtube.com/watch?v=EM7ZR5z3kCo
https://www.youtube.com/watch?v=6NPHk-Yd4VU
https://www.youtube.com/watch?v=6NPHk-Yd4VU
https://www.youtube.com/watch?v=dHiAls8loz4
https://www.youtube.com/watch?v=dHiAls8loz4
https://www.youtube.com/watch?v=2_mDOWLhkVU
https://www.youtube.com/watch?v=2_mDOWLhkVU

...and so on

Collectives™ on Stack Overflow

web scraping with beautiful soup in python

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related