1

Bottom line up front: I want to scrape the jobs from this website: https://www.gdit.com/careers/search/?q=bossier%20city, but I keep getting the javascript base page. If you inspect the page, you can see the jobs are listed in h3 tags but no matter what I do, the jobs don't pull up.

  1. I tried the following beautiful soup code:
url = "https://www.gdit.com/careers/search/?q=bossier%20city"
html_text = requests.get(url).text
soup = BeautifulSoup(html_text, "html.parser")
print(soup) #  for testing purposes
for job in soup.find_all('h3'):
     print(job)
  1. I tried ScraperAPI which I thought was supposed to load javascript for you:
url = "https://www.gdit.com/careers/search/?q=bossier%20city"
params = {'api_key': "MY-API-KEY", 'url': url}
response = requests.get('http://api.scraperapi.com/', params=params)
print(response.text) #  No H3 tags of any kind
  1. I tried html-requests:
session = HTMLSession()
r = session.get("https://www.gdit.com/careers/search/?q=bossier%20city")
data = r.html.render()
print(data)
  1. I tried Selenium first and then parsing it to beautifulsoup:
global driver
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("detach", True)
options.add_experimental_option('useAutomationExtension', False)
try:
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Users\Notebook\Documents\chromedriver.exe')
    driver.get(url)
    page_source = driver.page_source
    soup = BeautifulSoup(page_source, "html.parser")
    time.sleep(2)
    print(soup)
except exceptions.WebDriverException:
    print("You need to download a new version of the Chromedriver.")

Nothing works. Do I have to mimic a user entering Bossier City first and then retrieve the return? Anyways, any help would be appreciated.

2
  • 1
    Can you please edit in the results of the codes? Commented Oct 17, 2021 at 15:10
  • I can't. Stack limits you to only 30,000 lines and the full DOM is 477K lines of code. It pulls up all the HTML, Javascript, and CSS for the whole site. Commented Oct 17, 2021 at 15:16

2 Answers 2

1

I would suggest switching from BeautifulSoup (static loader, purely python based) to Selenium (dynamic loader, integrates into multiple webbrowsers like: chrome, firefox, etc, etc).

Learn More Here

Selenium is used for Automation testing on websites, however it can be used to scrape advanced dynamic websites.

it provides many features, from reading DOM values, to adding/remove or editing DOM elements, also you can wait for an element to come into existance by waiting for that element to appear or render.

driver.page_source only loads the base html and not the dynamic javascript. if you just print(driver.page_source) you will see what data is avaliable, adding time.sleep(10)

Sign up to request clarification or add additional context in comments.

7 Comments

I did try Selenium and it didn't work. Please see #4 in my post.
@BrandonJacobson don't use BS with Selenium, use either one, not both.
@BrandonJacobson driver.page_source only loads the base html and not the dynamic javascript. if you just print(driver.page_source) you will see what data is avaliable
perhaps the driver.page_source isn't loaded fully yet, perhaps when using Selenium, wait for the document ready load event happens.
You're right. I put in a 10 second long time.sleep(10) and it worked. Thanks!
|
0

I think your problem is simple. As you said this page is loading elements dynamically using JS.

Selenium simply waits for the html to load. And does not wait for any scripts to finish running.

In order to wait for a specific element all you have to do is add this functionality to your code (Selenium supports this). Here's a great post explaining this. This post explains how you can wait for a specific element to become intractable which is one step further than -what I'm guessing- you require.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.