I have been working on web scraping from python and I want to create a dataframe from a URL of a website. The data format of the file is .ods. I have tried downloading the .ods file to the computer using beautifulsoup and then reading it to create a dataframe. The file itself contains a header that has to be removed. I achieved successful result through this method and my code is attached below.
from pandas_ods_reader import read_ods
import bs4
import requests
import pandas as pd
url = "https://www.gov.uk/government/statistics/transport-use-during-the-coronavirus-covid-19-pandemic"
html = requests.get(url)
soup = bs4.BeautifulSoup(html.text, "html.parser")
i=0
for link in soup.find_all('a', href=True):
i+=1
href = link['href']
if any(href.endswith(x) for x in ['.ods']):
#print(href)
file_data = requests.get(href).content
with open('data.ods', "wb") as file:
file.write(file_data)
df = read_ods('data.ods', 1, headers=False)[6:-44]
df.index = range(0, 346)
df.columns = df.iloc[0]
df.drop(0)
df
Now I want to figure out whether this can be achieved directly without downloading the .ods file. If there is a way to directly create a dataframe from the .ods file available in the webpage, that would serve my purpose. Please suggest a suitable code if this is achievable
soup.select_one('.thumbnail')['href']