I'm trying to get a list of stock symbols using the Pandas read_html function (instead of using Beautiful Soup to scrape the web).
The website I'm referencing is:
https://en.wikipedia.org/wiki/List_of_S%26P_500_companies
The desired output is:
['MMM', 'ABT', 'ABBV', 'ACN', 'ATVI' ... ]
My code is:
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
df = pd.read_html(url)[0]
#df.columns = df.iloc[0]
df.drop(df.index[0], inplace=True)
tickers = df['Symbol'].tolist()
The output of this code is a dataframe that looks as follows:
df.head()
Symbol Security SEC filings GICS Sector GICS Sub Industry Headquarters Location Date first added CIK Founded
1 ABT Abbott Laboratories reports Health Care Health Care Equipment North Chicago, Illinois 1964-03-31 1800 1888
2 ABBV AbbVie Inc. reports Health Care Pharmaceuticals North Chicago, Illinois 2012-12-31 1551152 2013 (1888)
3 ABMD ABIOMED Inc reports Health Care Health Care Equipment Danvers, Massachusetts 2018-05-31 815094 1981
4 ACN Accenture plc reports Information Technology IT Consulting & Other Services Dublin, Ireland 2011-07-06 1467373 1989
5 ATVI Activision Blizzard reports Communication Services Interactive Home Entertainment Santa Monica, California 2015-08-31 718877 2008
If I uncomment df.columns = df.iloc[0], then Pandas throws the following error message
KeyError: 'Symbol'
The line df.iloc[0] returns:
Symbol ABT
Security Abbott Laboratories
SEC filings reports
GICS Sector Health Care
GICS Sub Industry Health Care Equipment
Headquarters Location North Chicago, Illinois
Date first added 1964-03-31
CIK 1800
Founded 1888
Which is not what I'm looking for (rather, the header row before this one that contains the 'Symbol' column).
Does anyone see what I'm doing incorrectly here? Thanks!