Background
Hi All...new to python and web scraping. I'm on a Mac (Sierra) running Jupyter Notebook in Firefox (87.0). I'm trying to scrape several values from a webpage like this one: https://www.replaypoker.com/tournaments/4337873. One example of a value I'd like to scrape is the Tournament ID.
What I've Tried
I first tried using BeautifulSoup, but the problem is that many of this page's elements are not written into the HTML. They appear to be stored in variables (javascript?) that need to be calculated and then scraped, so the below BeautifulSoup code just spit out the variable name as a string instead of the value.
import bs4
import requests
from bs4 import BeautifulSoup
url = 'https://www.replaypoker.com/tournaments/4337873'
xml_soup = bs4.BeautifulSoup(response.content,'xml')
tournament_ID = html_soup.find('strong',text='Tournament ID:')
print(tournament_ID.next_sibling.strip())
This returned #{{id}} when I wanted #4337873.
Reading a bit online, I learned that Selenium may address this issue by opening a headless instance of my browser, so I decided to switch and use Selenium. The problem is that I don't know how to get the value of the variable once I find the right element.
from selenium import webdriver
import time
running_tournament_url = 'https://www.replaypoker.com/tournaments/4337873'
driver = webdriver.Firefox(executable_path='/Users/maxwilliams/WebDrivers/geckodriver')
driver.get(running_tournament_url)
assert 'MTT' in driver.title
#tournament_id = driver.find_element_by_css_selector('div.col-xs-6:nth-child(1) > div:nth-child(2) > strong:nth-child(1)')
tournament_id = driver.find_element_by_xpath('/html/body/div[2]/section/div/div[1]/div[1]/div/div[1]/div[2]/strong')
print(tournament_id.text)
seats = driver.find_element_by_class_name('tournaments-seats-per-table')
print(seats.text)
time.sleep(3)
driver.quit()
This code spits out Tournament ID: but still not the tournament ID itself. I find this especially confusing because the above code for seats above will print Seats Per Table: 9, i.e. the label and the value.
Questions
- Was my decision to use Selenium necessary and correct? Or could this better accomplished with another library?
- How can I scrape the tournament ID value (and others like it)?
