1

I'm currently using Selenium and BeautifulSoup to try to scrape financial statement data from Google Finance. For example:

http://www.google.com/finance?q=GOOG&fstype=ii

opens to Income Statement for Google. When I get Selenium to click the "Balance Statement" and "Cash Flow" buttons at the top of the page, the charts and tables on the page change, but the url doesn't change, and when I pull the page source, it is the original page with the Income Statement table. My code is posted below:

driver = webdriver.Firefox()
driver.get("http://www.google.com/finance?q=" + ticker[0] + "&fstype=ii")

url1 = driver.page_source
soup1 = BeautifulSoup(url1)

element = driver.find_element_by_xpath('//*[@id=":1"]/a/b/b')
element.click()

driver.implicity_wait(3.0)
url2 = driver.page_source
soup2 = BeautifulSoup(url2)

element = driver.find_element_by_xpath('//*[@id=":2"]/a/b/b')
element.click()

driver.implicity_wait(3.0)
url3 = driver.page_source
soup3 = BeautifulSoup(url3)

driver.quit()

Any help is appreciated. Thanks.

0

1 Answer 1

3

You don't need BeautifulSoup HTML parser here. Selenium itself is powerful enough in navigating on the page and getting elements by almost everything you can imagine.

The table data you need is inside div elements with different ids. Activate each tab and get the data from an appropriate div.

Here's an example that prints out headers of the tables inside all of the tabs:

from selenium import webdriver

def print_header(element):
    table = element.find_element_by_id('fs-table')
    for row in table.find_elements_by_tag_name('th'):
        print row.text


driver = webdriver.Firefox()
driver.get('http://www.google.com/finance?q=GOOG&fstype=ii')

print_header(driver.find_element_by_id('incinterimdiv'))
print "----"

# activate Balance Sheet
element = driver.find_element_by_xpath('//*[@id=":1"]/a/b/b')
element.click()

print_header(driver.find_element_by_id('balinterimdiv'))
print "----"

# activate Cash Flow
element = driver.find_element_by_xpath('//*[@id=":2"]/a/b/b')
element.click()

print_header(driver.find_element_by_id('casinterimdiv'))

driver.quit()

Prints:

In Millions of USD (except for per share items)
3 months ending 2014-03-31
3 months ending 2013-12-31
3 months ending 2013-09-30
3 months ending 2013-06-30
3 months ending 2013-03-31
----
In Millions of USD (except for per share items)
As of 2014-03-31
As of 2013-12-31
As of 2013-09-30
As of 2013-06-30
As of 2013-03-31
----
In Millions of USD (except for per share items)
3 months ending 2014-03-31
12 months ending 2013-12-31
9 months ending 2013-09-30
6 months ending 2013-06-30
3 months ending 2013-03-31
Sign up to request clarification or add additional context in comments.

2 Comments

So would I add another for loop in the print_header function that would say something like: for col in table.find_elements_by_tag_name('td'): then save the results in a python object?
@user2395969 you can find elements inside table, each tr etc - depends on what is your desired output. The point here is to use selenium only. Hope that helps.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.