I am trying to retrieve the comment section on regulations.gov pages. An example is the paragraph "Restrictions on Proprietary Trading... with free market driven valuations." on http://www.regulations.gov/#!documentDetail;D=OCC-2011-0014-0032.
I am using BeautifulSoup and Python and have the following code:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.PhantomJS()
driver.get(http://www.regulations.gov/#!documentDetail;D=OCC-2011-0014-0032)
source = driver.page_source.encode('ascii', 'replace')
soup = BeautifulSoup(source)
print soup
commentHolder = soup.find("div", {"class":"GGAAYMKDDNE"})
print commentHolder
When I execute "print soup" I get an output (albeit a messy one), but when I execute "print commentHolder" I get "None" as the output. I am not quite sure why this is happening and would appreciate any help. Thank you.
Note: I used Selenium webdriver to try and get around the Javascript - is this a correct approach?