I've used Selenium to scrape a dynamic Javascript table of Federal employee position and salary info from http://www.fedsdatacenter.com/federal-pay-rates/index.php?n=&l=&a=SECURITIES+AND+EXCHANGE+COMMISSION&o=&y=all. (Note: It's all public domain data, so no worries re: personal information).
I'm trying to get it into a Pandas DF for analysis. My problem is that my Selenium input data is a list that prints as:
[u'DOE,JON'], [u'14'], [u'SK'], [u'$176,571.00'], [u'$2,000.00'], [u'SECURITIES AND EXCHANGE COMMISSION'], [u'WASHINGTON'], [u'GENERAL ATTORNEY'], [u'2012']], ...
What I want to get to is a DF that handles an arbitrary number of records as:
NAME GRADE SCALE SALARY BONUS AGENCY LOCATION POSITION YEAR
Doe, Jon 14 SK $176,571.00 $2,000.00 SEC DC ATTY 2012
.
.
.
I've tried converting this list to a dictionary, using the zip() function with the col names as a tuple and the data as a list, etc., all to no avail, though it's been a good tour of Python's features. What should the next step be after getting the data or should I be reading the data in a different way?
Currently, the scraper code is:
from selenium import webdriver
path_to_chromedriver = '/Users/xxx/Documents/webdriver/chromedriver' # change path as needed
browser = webdriver.Chrome(executable_path = path_to_chromedriver)
url = 'http://www.fedsdatacenter.com/federal-pay-rates/index.php'
browser.get(url)
inputAgency = browser.find_element_by_id('a')
inputYear = browser.find_element_by_id('y')
# Send data
inputAgency.send_keys('SECURITIES AND EXCHANGE COMMISSION')
inputYear.send_keys('All')
# Select 'All' from Years element
browser.find_element_by_css_selector('input[type=\"submit\"]').click()
browser.find_element_by_xpath('//*[@id="example_length"]/label/select/option[4]').click()
SMRtable = browser.find_element_by_id('example')
scrapedData = []
for td in SMRtable.find_elements_by_xpath('.//td'):
scrapedData.append([td.get_attribute('innerHTML')])
print td.get_attribute('innerHTML')