I am trying to convert multiple html tables to a pandas dataframe, For this task I've defined a function to return all these html tables as a pandas dataframe,
However the function returns a null list [] when the idea is that it returns a pandas dataframe.
Here's what I've tried so far:
Getting all the needed links as a list
import requests
from bs4 import BeautifulSoup
import lxml
import html5lib
import pandas as pd
import string
### defining a list for all the needed links ###
first_url='https://www.salario.com.br/tabela-salarial/?cargos='
second_url='#listaSalarial'
allTheLetters = string.ascii_uppercase
links = []
for letter in allTheLetters:
links.append(first_url+letter+second_url)
defining a function
### defining function to parse html objects ###
def getUrlTables(links):
for link in links:
# requesting link, parsing and finding tag:table #
page = requests.get(link)
soup = BeautifulSoup(page.content, 'html.parser')
tab_div = soup.find_all('table', {'class':'listas'})
# writing html files into directory #
with open('listas_salariales.html', "w") as file:
file.write(str(tab_div))
file.close
# reading html file as a pandas dataframe #
tables=pd.read_html('listas_salariales.html')
return tables
Testing output
getUrlTables(links)
[]
Am I missing something in getUrlTables()?
Is there an easier way to accomplish this task?