I am having some problems trying to manipulate some strings here. I am scraping some data from a website and I am facing 2 challenges:
I am scraping unnecessary data as the website I target has redundant class naming. My goal is to isolate this data and delete it so I can keep only the data I am interested in.
With the data kept, I need to split the string in order to store some information into specific variables.
So initially I was planning to use a simple split() function and store each new string into list and then play with it to keep the parts that I want. Unfortunately, every time I do this, I end up with 3 separate lists that I cannot manipulate/split.
Here is the code:
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome('\\Users\\rapha\\Desktop\\10Milz\\4. Python\\Python final\\Scrape\\chromedriver.exe')
driver.get("https://www.atptour.com/en/scores/2020/7851/MS011/match-stats")
content = driver.page_source
soup = BeautifulSoup(content, "html.parser" )
for infos in soup.find_all('h3', class_='section-title'):
title = infos.get_text()
title = ' '.join(title.split())
title_list = []
title_list = title.split(" | ")
print(title_list)
Here is the "raw data" retrieve
Player Results
Tournament Results
Salvatore Caruso VS. Brandon Nakashima | Indian Wells 2020
And here is what I like to achieve
Variable_1 = Salvatore Caruso
Variable_2 = Brandon Nakashima
Variable 3 = Indian Wells
Variable 4 = 2020
Could you please let me know how to proceed here?