First, to be clear: My desired goal is to scrape data from ~100 URLS monthly using the code below. I need data from each URL to be exported to the same XLSX file but in different sheets with a predetermined name. Example from code below: Workbook name = "data.xlsx", and sheet name = "FEUR". ALSO: All of the links have the exact same layout and XPATHs. Works perfectly to just insert a new link.
The only solution I have found to be working so far is copy-pasting the code from the ####### and down, where I change the URL in driver.get() and the sheet_name="XX" in df.to_excel().
Instead, I am looking for a more efficient code to add links and make the code less heavy. Is this possible to do using Selenium?
See the code below:
from bs4 import BeautifulSoup
import os
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
import pandas as pd
from openpyxl import load_workbook
opts = Options()
opts.add_argument(" --headless")
chrome_driver = os.getcwd() +"/chromedriver"
driver = webdriver.Chrome(options=opts, executable_path=chrome_driver)
driver.implicitly_wait(10)
############
#FEUR
driver.get("https://www.morningstar.dk/dk/funds/snapshot/snapshot.aspx?id=F00000ZG2F&tab=3")
driver.switch_to.frame(1)
driver.find_element_by_xpath("//button[contains(@class,'show-table')]//span").click()
table = driver.find_elements_by_xpath("//div[contains(@class,'sal-mip-factor-profile__value-table')]/table//tr/th")
header = []
for tab in table:
header.append(tab.text)
#print(header)
tablebody = driver.find_elements_by_xpath("//div[contains(@class,'sal-mip-factor-profile__value-table')]/table//tbody/tr")
data = []
data.append(header)
for tab in tablebody:
row = []
content = tab.find_elements_by_tag_name("td")
for con in content:
row.append(con.text)
data.append(row)
df = pd.DataFrame(data)
path= r'/Users/karlemilthulstrup/Downloads/data.xlsx'
book = load_workbook(path)
writer = pd.ExcelWriter(path, engine='openpyxl')
writer.book = book
df.to_excel(writer, sheet_name="FEUR")
writer.save()
writer.close()