I am new to web scraping and I am trying to scrape all the video links from each page of this specific site and writing that into a csv file. For starters I am trying to scrape the URLs from this site:
and going through all 19 pages. The problem I'm encountering is that the same 20 video links are being written 19 times(because I'm trying to go through all 19 pages), instead of having (around) 19 distinct sets of URLs.
import requests
from bs4 import BeautifulSoup
from csv import writer
def make_soup(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
return soup
def scrape_url():
for video in soup.find_all('a', class_='img-anchor'):
link = video['href'].replace('//','')
csv_writer.writerow([link])
with open("videoLinks.csv", 'w') as csv_file:
csv_writer = writer(csv_file)
header = ['URLS']
csv_writer.writerow(header)
url = 'https://search.bilibili.com/all?keyword=%E3%82%A2%E3%83%8B%E3%82%B2%E3%83%A9%EF%BC%81%E3%83%87%E3%82%A3%E3%83%89%E3%82%A5%E3%83%BC%E3%83%BC%E3%83%B3'
soup = make_soup(url)
lastButton = soup.find_all(class_='page-item last')
lastPage = lastButton[0].text
lastPage = int(lastPage)
#print(lastPage)
page = 1
pageExtension = ''
scrape_url()
while page < lastPage:
page = page + 1
if page == 1:
pageExtension = ''
else:
pageExtension = '&page='+str(page)
#print(url+pageExtension)
fullUrl = url+pageExtension
make_soup(fullUrl)
scrape_url()
Any help is much appreciated and I decided to code this specific way so that I can better generalize this throughout the BiliBili site.
A screenshot is linked below showing how the first link repeats a total of 19 times:
