Scraping a dynamic table using Selenium in Python3

Question

I am trying to scrape the symbols from this page, https://www.barchart.com/stocks/indices/sp/sp400?page=all

When I look at the source in the Firefox browser (using Ctrl-U), none of the symbols turns up. Thinking maybe Selenium might be able to obtain the dynamic table, I ran the following code.

sp400_url= "https://www.barchart.com/stocks/indices/sp/sp400?page=all"

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(sp400_url)

html = driver.page_source
soup = BeautifulSoup(html)
print(soup)

The print command doesn't show any of the symbols we see on the page. Is there a way to scrape the symbols from this page?

Edited to clarify: I am interested in just the symbols and not the prices. So the list should read: AAN, AAXN, ACC, ACHC, ...

Which symbols are you after? I'm looking at the page and it's mostly text... Closest i can think you mean is the value on that bar charr - I can get that: left: 84.0351%; corresponding to Open 1,945.28 ... Happy to look again if you clarify what you need :-) — RichEdwards
– RichEdwards, Commented Aug 25, 2020 at 14:23

chitown88 · Accepted Answer · 2020-08-25 14:28:30Z

You can easily feed this into pandas' .read_html() to get the table and turn the symbols column into a list. Note: I used chromedriver instead of firefox

import pandas as pd
from selenium import webdriver


sp400_url= "https://www.barchart.com/stocks/indices/sp/sp400?page=all"
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(sp400_url)

html = driver.page_source

df = pd.read_html(html)[-1]

driver.close()

symbolsList = list(df['Symbol'])

Output:

print(symbolsList)
['AAN', 'AAXN', 'ACC', 'ACHC', 'ACIW', 'ACM', 'ADNT', 'ADS', 'AEO', 'AFG', 'AGCO', 'ALE', 'AM', 'AMCX', 'AMED', 'AMG', 'AN', 'ARW', 'ARWR', 'ASB', 'ASGN', 'ASH', 'ATGE', 'ATI', 'ATR', 'AVNS', 'AVNT', 'AVT', 'AYI', 'BC', 'BCO', 'BDC', 'BHF', 'BJ', 'BKH', 'BLD', 'BLKB', 'BOH', 'BRO', 'BRX', 'BXS', 'BYD', 'CABO', 'CACI', 'CAR', 'CASY', 'CATY', 'CBRL', 'CBSH', 'CBT', 'CC', 'CCMP', 'CDAY', 'CDK', 'CFR', 'CFX', 'CGNX', 'CHDN', 'CHE', 'CHH', 'CHX', 'CIEN', 'CIT', 'CLGX', 'CLH', 'CLI', 'CMC', 'CMD', 'CMP', 'CNK', 'CNO', 'CNX', 'COHR', 'COLM', 'CONE', 'COR', 'CPT', 'CR', 'CREE', 'CRI', 'CRL', 'CRS', 'CRUS', 'CSL', 'CTLT', 'CUZ', 'CVLT', 'CW', 'CXW', 'CZR', 'DAN', 'DAR', 'DCI', 'DECK', 'DEI', 'DKS', 'DLPH', 'DLX', 'DNKN', 'DOC', 'DY', 'EBS', 'EGP', 'EHC', 'EME', 'ENPH', 'ENR', 'ENS', 'EPC', 'EPR', 'EQT', 'ESNT', 'ETRN', 'ETSY', 'EV', 'EVR', 'EWBC', 'EXEL', 'EXP', 'FAF', 'FCFS', 'FCN', 'FDS', 'FFIN', 'FHI', 'FHN', 'FICO', 'FIVE', 'FL', 'FLO', 'FLR', 'FNB', 'FR', 'FSLR', 'FULT', 'GATX', 'GBCI', 'GEF', 'GEO', 'GGG', 'GHC', 'GMED', 'GNRC', 'GNTX', 'GNW', 'GO', 'GRUB', 'GT', 'HAE', 'HAIN', 'HCSG', 'HE', 'HELE', 'HIW', 'HNI', 'HOG', 'HOMB', 'HPP', 'HQY', 'HR', 'HRC', 'HUBB', 'HWC', 'HXL', 'IART', 'IBKR', 'IBOC', 'ICUI', 'IDA', 'IDCC', 'IIVI', 'INGR', 'INT', 'ITT', 'JACK', 'JBGS', 'JBL', 'JBLU', 'JCOM', 'JEF', 'JHG', 'JLL', 'JW.A', 'JWN', 'KAR', 'KBH', 'KBR', 'KEX', 'KMPR', 'KMT', 'KNX', 'KRC', 'LAMR', 'LANC', 'LEA', 'LECO', 'LFUS', 'LGND', 'LHCG', 'LII', 'LITE', 'LIVN', 'LOGM', 'LOPE', 'LPX', 'LSI', 'LSTR', 'MAC', 'MAN', 'MANH', 'MASI', 'MAT', 'MCY', 'MD', 'MDU', 'MIDD', 'MKSI', 'MLHR', 'MMS', 'MOH', 'MPW', 'MPWR', 'MRCY', 'MSA', 'MSM', 'MTX', 'MTZ', 'MUR', 'MUSA', 'NATI', 'NAVI', 'NCR', 'NDSN', 'NEU', 'NFG', 'NGVT', 'NJR', 'NKTR', 'NNN', 'NSP', 'NTCT', 'NUS', 'NUVA', 'NVT', 'NWE', 'NYCB', 'NYT', 'OC', 'OFC', 'OGE', 'OGS', 'OHI', 'OI', 'OLED', 'OLLI', 'OLN', 'ORI', 'OSK', 'OZK', 'PACW', 'PB', 'PBF', 'PBH', 'PCH', 'PCTY', 'PDCO', 'PEB', 'PEN', 'PENN', 'PII', 'PK', 'PNFP', 'PNM', 'POOL', 'POST', 'PPC', 'PRAH', 'PRI', 'PRSP', 'PSB', 'PTC', 'PZZA', 'QDEL', 'QLYS', 'R', 'RAMP', 'RBC', 'RGA', 'RGEN', 'RGLD', 'RH', 'RIG', 'RLI', 'RNR', 'RPM', 'RS', 'RYN', 'SABR', 'SAFM', 'SAIC', 'SAM', 'SBH', 'SBNY', 'SBRA', 'SCI', 'SEDG', 'SEIC', 'SF', 'SFM', 'SGMS', 'SIGI', 'SIX', 'SKX', 'SLAB', 'SLGN', 'SLM', 'SMG', 'SMTC', 'SNV', 'SNX', 'SON', 'SR', 'SRC', 'SRCL', 'STL', 'STLD', 'STOR', 'STRA', 'SVC', 'SWX', 'SXT', 'SYNA', 'SYNH', 'TCBI', 'TCF', 'TCO', 'TDC', 'TDS', 'TECH', 'TER', 'TEX', 'TGNA', 'THC', 'THG', 'THO', 'THS', 'TKR', 'TMHC', 'TOL', 'TPH', 'TPX', 'TR', 'TREE', 'TREX', 'TRIP', 'TRMB', 'TRMK', 'TRN', 'TTC', 'TTEK', 'TXRH', 'UBSI', 'UE', 'UFS', 'UGI', 'UMBF', 'UMPQ', 'UNVR', 'URBN', 'UTHR', 'VAC', 'VC', 'VLY', 'VMI', 'VSAT', 'VSH', 'VVV', 'WAFD', 'WBS', 'WEN', 'WERN', 'WEX', 'WH', 'WOR', 'WPX', 'WRI', 'WSM', 'WSO', 'WTFC', 'WTRG', 'WW', 'WWD', 'WWE', 'WYND', 'X', 'XEC', 'XPO', 'Y', 'YELP', 'Symbol']

This works. I changed it to Firefox to see if the issue is due to Firefox. I am curious what I did wrong. Why did it not show when I fed it to BeautifulSoup?

JaSON · Accepted Answer · 2020-08-25 14:48:53Z

0

If elements are not present in page source try to implement ExplicitWait:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get(sp400_url)

wait = WebDriverWait(driver, 10)
symbols = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//td[contains(@class, "symbol")]//a[starts-with(@href, "/stocks/quotes/")]')))
for symbol in symbols:
    print(symbol.text)

edited Aug 25, 2020 at 14:48

answered Aug 25, 2020 at 14:34

JaSON

4,8912 gold badges12 silver badges18 bronze badges

2 Comments

Spinor8 Over a year ago

Thanks for this attempt but it gives me additional stuff in front: $IDX, See Quote, Full Chart, Full Chart, Full Chart, Full Chart, Full Chart, Full Chart.

JaSON Over a year ago

@Spinor8 , oh. yeah. updated with more specific locator

rahul rai · Accepted Answer · 2020-08-25 16:31:34Z

I am not sure why you want to scrape compete page. if you need just Symbols. You can simply get list of all such elements and then put in a list.

driver = webdriver.Firefox(executable_path=r'..\drivers\geckodriver.exe')
driver.get("https://www.barchart.com/stocks/indices/sp/sp400?page=all")

# Waiting for table to laod
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h4[contains(text(),'S&P 400  Components')]")))
symbols = driver.find_elements_by_xpath("//div[@class='bc-table-scrollable-inner']//a[@data-ng-bind='cell']")
symbolList = []
for symbol in symbols:
    symbolList.append(symbol.text)

print(len(symbolList)) #Length of list
print(symbolList) #Content of list

Out Put:

Collectives™ on Stack Overflow

Scraping a dynamic table using Selenium in Python3

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related