Trouble Parsing Text using BeautifulSoup and Python

Question

I am trying to retrieve the comment section on regulations.gov pages. An example is the paragraph "Restrictions on Proprietary Trading... with free market driven valuations." on http://www.regulations.gov/#!documentDetail;D=OCC-2011-0014-0032.

I am using BeautifulSoup and Python and have the following code:

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get(http://www.regulations.gov/#!documentDetail;D=OCC-2011-0014-0032)
source = driver.page_source.encode('ascii', 'replace')
soup = BeautifulSoup(source)
print soup
commentHolder = soup.find("div", {"class":"GGAAYMKDDNE"})
print commentHolder

When I execute "print soup" I get an output (albeit a messy one), but when I execute "print commentHolder" I get "None" as the output. I am not quite sure why this is happening and would appreciate any help. Thank you.

Note: I used Selenium webdriver to try and get around the Javascript - is this a correct approach?

alecxe · Accepted Answer · 2015-03-07 05:55:31Z

2

You need to let PhantomJS explicitly wait for the element to become present before reading the page_source. Worked for me:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.PhantomJS()
driver.get("http://www.regulations.gov/#!documentDetail;D=OCC-2011-0014-0032")

wait = WebDriverWait(driver, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.GGAAYMKDGNE")))

answered Mar 7, 2015 at 5:55

alecxe

476k127 gold badges1.1k silver badges1.2k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

kpz Over a year ago

Thank you so much, I was about to give up since I do not know JS well (hence me deleting the question at first). I'm very glad and thankful for your help.

Collectives™ on Stack Overflow

Trouble Parsing Text using BeautifulSoup and Python

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related