1

I am trying to scrape the symbols from this page, https://www.barchart.com/stocks/indices/sp/sp400?page=all

When I look at the source in the Firefox browser (using Ctrl-U), none of the symbols turns up. Thinking maybe Selenium might be able to obtain the dynamic table, I ran the following code.

sp400_url= "https://www.barchart.com/stocks/indices/sp/sp400?page=all"

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Firefox()
driver.get(sp400_url)

html = driver.page_source
soup = BeautifulSoup(html)
print(soup)

The print command doesn't show any of the symbols we see on the page. Is there a way to scrape the symbols from this page?

Edited to clarify: I am interested in just the symbols and not the prices. So the list should read: AAN, AAXN, ACC, ACHC, ...

2
  • Which symbols are you after? I'm looking at the page and it's mostly text... Closest i can think you mean is the value on that bar charr - I can get that: left: 84.0351%; corresponding to Open 1,945.28 ... Happy to look again if you clarify what you need :-) Commented Aug 25, 2020 at 14:23
  • Clarified my question above with my expected output. Commented Aug 25, 2020 at 14:27

3 Answers 3

5

You can easily feed this into pandas' .read_html() to get the table and turn the symbols column into a list. Note: I used chromedriver instead of firefox

import pandas as pd
from selenium import webdriver


sp400_url= "https://www.barchart.com/stocks/indices/sp/sp400?page=all"
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(sp400_url)

html = driver.page_source

df = pd.read_html(html)[-1]

driver.close()

symbolsList = list(df['Symbol'])

Output:

print(symbolsList)
['AAN', 'AAXN', 'ACC', 'ACHC', 'ACIW', 'ACM', 'ADNT', 'ADS', 'AEO', 'AFG', 'AGCO', 'ALE', 'AM', 'AMCX', 'AMED', 'AMG', 'AN', 'ARW', 'ARWR', 'ASB', 'ASGN', 'ASH', 'ATGE', 'ATI', 'ATR', 'AVNS', 'AVNT', 'AVT', 'AYI', 'BC', 'BCO', 'BDC', 'BHF', 'BJ', 'BKH', 'BLD', 'BLKB', 'BOH', 'BRO', 'BRX', 'BXS', 'BYD', 'CABO', 'CACI', 'CAR', 'CASY', 'CATY', 'CBRL', 'CBSH', 'CBT', 'CC', 'CCMP', 'CDAY', 'CDK', 'CFR', 'CFX', 'CGNX', 'CHDN', 'CHE', 'CHH', 'CHX', 'CIEN', 'CIT', 'CLGX', 'CLH', 'CLI', 'CMC', 'CMD', 'CMP', 'CNK', 'CNO', 'CNX', 'COHR', 'COLM', 'CONE', 'COR', 'CPT', 'CR', 'CREE', 'CRI', 'CRL', 'CRS', 'CRUS', 'CSL', 'CTLT', 'CUZ', 'CVLT', 'CW', 'CXW', 'CZR', 'DAN', 'DAR', 'DCI', 'DECK', 'DEI', 'DKS', 'DLPH', 'DLX', 'DNKN', 'DOC', 'DY', 'EBS', 'EGP', 'EHC', 'EME', 'ENPH', 'ENR', 'ENS', 'EPC', 'EPR', 'EQT', 'ESNT', 'ETRN', 'ETSY', 'EV', 'EVR', 'EWBC', 'EXEL', 'EXP', 'FAF', 'FCFS', 'FCN', 'FDS', 'FFIN', 'FHI', 'FHN', 'FICO', 'FIVE', 'FL', 'FLO', 'FLR', 'FNB', 'FR', 'FSLR', 'FULT', 'GATX', 'GBCI', 'GEF', 'GEO', 'GGG', 'GHC', 'GMED', 'GNRC', 'GNTX', 'GNW', 'GO', 'GRUB', 'GT', 'HAE', 'HAIN', 'HCSG', 'HE', 'HELE', 'HIW', 'HNI', 'HOG', 'HOMB', 'HPP', 'HQY', 'HR', 'HRC', 'HUBB', 'HWC', 'HXL', 'IART', 'IBKR', 'IBOC', 'ICUI', 'IDA', 'IDCC', 'IIVI', 'INGR', 'INT', 'ITT', 'JACK', 'JBGS', 'JBL', 'JBLU', 'JCOM', 'JEF', 'JHG', 'JLL', 'JW.A', 'JWN', 'KAR', 'KBH', 'KBR', 'KEX', 'KMPR', 'KMT', 'KNX', 'KRC', 'LAMR', 'LANC', 'LEA', 'LECO', 'LFUS', 'LGND', 'LHCG', 'LII', 'LITE', 'LIVN', 'LOGM', 'LOPE', 'LPX', 'LSI', 'LSTR', 'MAC', 'MAN', 'MANH', 'MASI', 'MAT', 'MCY', 'MD', 'MDU', 'MIDD', 'MKSI', 'MLHR', 'MMS', 'MOH', 'MPW', 'MPWR', 'MRCY', 'MSA', 'MSM', 'MTX', 'MTZ', 'MUR', 'MUSA', 'NATI', 'NAVI', 'NCR', 'NDSN', 'NEU', 'NFG', 'NGVT', 'NJR', 'NKTR', 'NNN', 'NSP', 'NTCT', 'NUS', 'NUVA', 'NVT', 'NWE', 'NYCB', 'NYT', 'OC', 'OFC', 'OGE', 'OGS', 'OHI', 'OI', 'OLED', 'OLLI', 'OLN', 'ORI', 'OSK', 'OZK', 'PACW', 'PB', 'PBF', 'PBH', 'PCH', 'PCTY', 'PDCO', 'PEB', 'PEN', 'PENN', 'PII', 'PK', 'PNFP', 'PNM', 'POOL', 'POST', 'PPC', 'PRAH', 'PRI', 'PRSP', 'PSB', 'PTC', 'PZZA', 'QDEL', 'QLYS', 'R', 'RAMP', 'RBC', 'RGA', 'RGEN', 'RGLD', 'RH', 'RIG', 'RLI', 'RNR', 'RPM', 'RS', 'RYN', 'SABR', 'SAFM', 'SAIC', 'SAM', 'SBH', 'SBNY', 'SBRA', 'SCI', 'SEDG', 'SEIC', 'SF', 'SFM', 'SGMS', 'SIGI', 'SIX', 'SKX', 'SLAB', 'SLGN', 'SLM', 'SMG', 'SMTC', 'SNV', 'SNX', 'SON', 'SR', 'SRC', 'SRCL', 'STL', 'STLD', 'STOR', 'STRA', 'SVC', 'SWX', 'SXT', 'SYNA', 'SYNH', 'TCBI', 'TCF', 'TCO', 'TDC', 'TDS', 'TECH', 'TER', 'TEX', 'TGNA', 'THC', 'THG', 'THO', 'THS', 'TKR', 'TMHC', 'TOL', 'TPH', 'TPX', 'TR', 'TREE', 'TREX', 'TRIP', 'TRMB', 'TRMK', 'TRN', 'TTC', 'TTEK', 'TXRH', 'UBSI', 'UE', 'UFS', 'UGI', 'UMBF', 'UMPQ', 'UNVR', 'URBN', 'UTHR', 'VAC', 'VC', 'VLY', 'VMI', 'VSAT', 'VSH', 'VVV', 'WAFD', 'WBS', 'WEN', 'WERN', 'WEX', 'WH', 'WOR', 'WPX', 'WRI', 'WSM', 'WSO', 'WTFC', 'WTRG', 'WW', 'WWD', 'WWE', 'WYND', 'X', 'XEC', 'XPO', 'Y', 'YELP', 'Symbol']
Sign up to request clarification or add additional context in comments.

2 Comments

This works. I changed it to Firefox to see if the issue is due to Firefox. I am curious what I did wrong. Why did it not show when I fed it to BeautifulSoup?
ya not sure. It should have been in your soup object.
0

If elements are not present in page source try to implement ExplicitWait:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Firefox()
driver.get(sp400_url)

wait = WebDriverWait(driver, 10)
symbols = wait.until(EC.presence_of_all_elements_located((By.XPATH, '//td[contains(@class, "symbol")]//a[starts-with(@href, "/stocks/quotes/")]')))
for symbol in symbols:
    print(symbol.text)

2 Comments

Thanks for this attempt but it gives me additional stuff in front: $IDX, See Quote, Full Chart, Full Chart, Full Chart, Full Chart, Full Chart, Full Chart.
@Spinor8 , oh. yeah. updated with more specific locator
0

I am not sure why you want to scrape compete page. if you need just Symbols. You can simply get list of all such elements and then put in a list.

driver = webdriver.Firefox(executable_path=r'..\drivers\geckodriver.exe')
driver.get("https://www.barchart.com/stocks/indices/sp/sp400?page=all")

# Waiting for table to laod
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//h4[contains(text(),'S&P 400  Components')]")))
symbols = driver.find_elements_by_xpath("//div[@class='bc-table-scrollable-inner']//a[@data-ng-bind='cell']")
symbolList = []
for symbol in symbols:
    symbolList.append(symbol.text)

print(len(symbolList)) #Length of list
print(symbolList) #Content of list

Out Put:

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.