Scraping dynamic webpage using Python

Question

I am trying to scrape following dynamically generated webpage https://www.governmentjobs.com/careers/capecoral?page=1 I've used requests, scrapy, scrapy-splash but I simply get page source code and I don't get any job listing.

import requests
from bs4 import BeautifulSoup`
r = requests.get("https://www.governmentjobs.com/careers/capecoral?page=1")
soup = BeautifulSoup(r.content)
n_jobs = soup.select("#number-found-items")[0].text.strip()
print(n_jobs)

It always returns 0 jobs found

you may have the most common problem: page may use JavaScript to add/update elements but BeautifulSoup/'lxml, requests/urllib can't run JS. You may need Selenium to control real web browser which can run JS. OR try to use (manually) DevTools in Firefox/Chrome (tab Network) to see if JavaScript reads data from some URL. And you can try this URL with requests. JS usually get JSON which can be easy converted to Python dictionary (without BS). You can also check if page has (free) API for programmers. — furas
– furas, Commented Mar 1, 2022 at 13:22
@furas Actually I tried this method but was unsuccessful but I tried it one last time and it worked. Thanks :). — Huzaifa Farooq
– Huzaifa Farooq, Commented Mar 2, 2022 at 3:13

Md. Fazlul Hoque · Accepted Answer · 2022-03-01 10:50:47Z

As the url is dynamic that's why you can use selenium with bs4 to get the desired data. Here is an example.Please, just run the code.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url = "https://www.governmentjobs.com/careers/capecoral?page=1"

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)


soup = BeautifulSoup(driver.page_source, 'lxml')

for title in soup.select('.list-item h3 > a'):
    print(title.text)

Output:

Assistant City Attorney / City Attorney's Office
Business Applications Analyst II / Information Technology Services #6425
Contract Athletic Official / Athletics / Parks & Recreation #6237
Contract Background Investigation Specialist / Investigations / Police Dept.  #6514
Contract Beverage Cart/Waiter/Waitress / Parks and Recreation / Coral Oaks #6479
Contract Counselor / Youth Center / Parks & Recreation #6317
Contract Counselor/Instructor / Parks & Recreation / Special Populations #6339
Contract Custodial Worker / Lake Kennedy / Parks & Recreation #6525
Contract Custodial Worker /Parks & Recreation / Yacht Club #6312
Contract Golf Course Outside Operations / Parks & Recreation / Coral Oaks  #6535

Ali Zaib · Accepted Answer · 2022-03-01 10:20:11Z

1

you are trying to scrap data from a website which is using javascript, for that purpose you have to use selenium that will make sure page is fully rendered with data then send request to get page contents.

answered Mar 1, 2022 at 10:20

Ali Zaib

1065 bronze badges

Comments

Huzaifa Farooq · Accepted Answer · 2022-03-02 03:15:07Z

-1

I network I had to just copy request in curl and then convert it into python code using https://curlconverter.com/.

answered Mar 2, 2022 at 3:15

Huzaifa Farooq

1162 silver badges11 bronze badges

1 Comment

gre_gor Over a year ago

How does this help with dynamic content?

Collectives™ on Stack Overflow

Scraping dynamic webpage using Python

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related