-2

I am trying to scrape following dynamically generated webpage https://www.governmentjobs.com/careers/capecoral?page=1 I've used requests, scrapy, scrapy-splash but I simply get page source code and I don't get any job listing.

import requests
from bs4 import BeautifulSoup`
r = requests.get("https://www.governmentjobs.com/careers/capecoral?page=1")
soup = BeautifulSoup(r.content)
n_jobs = soup.select("#number-found-items")[0].text.strip()
print(n_jobs)

It always returns 0 jobs found

2
  • 1
    you may have the most common problem: page may use JavaScript to add/update elements but BeautifulSoup/'lxml, requests/urllib can't run JS. You may need Selenium to control real web browser which can run JS. OR try to use (manually) DevTools in Firefox/Chrome (tab Network) to see if JavaScript reads data from some URL. And you can try this URL with requests. JS usually get JSON which can be easy converted to Python dictionary (without BS). You can also check if page has (free) API for programmers. Commented Mar 1, 2022 at 13:22
  • @furas Actually I tried this method but was unsuccessful but I tried it one last time and it worked. Thanks :). Commented Mar 2, 2022 at 3:13

3 Answers 3

2

As the url is dynamic that's why you can use selenium with bs4 to get the desired data. Here is an example.Please, just run the code.

import time
from bs4 import BeautifulSoup
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

url = "https://www.governmentjobs.com/careers/capecoral?page=1"

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.maximize_window()
time.sleep(8)
driver.get(url)
time.sleep(10)


soup = BeautifulSoup(driver.page_source, 'lxml')

for title in soup.select('.list-item h3 > a'):
    print(title.text)

Output:

Assistant City Attorney / City Attorney's Office
Business Applications Analyst II / Information Technology Services #6425
Contract Athletic Official / Athletics / Parks & Recreation #6237
Contract Background Investigation Specialist / Investigations / Police Dept.  #6514
Contract Beverage Cart/Waiter/Waitress / Parks and Recreation / Coral Oaks #6479
Contract Counselor / Youth Center / Parks & Recreation #6317
Contract Counselor/Instructor / Parks & Recreation / Special Populations #6339
Contract Custodial Worker / Lake Kennedy / Parks & Recreation #6525
Contract Custodial Worker /Parks & Recreation / Yacht Club #6312
Contract Golf Course Outside Operations / Parks & Recreation / Coral Oaks  #6535
    
     
Sign up to request clarification or add additional context in comments.

Comments

1

you are trying to scrap data from a website which is using javascript, for that purpose you have to use selenium that will make sure page is fully rendered with data then send request to get page contents.

Comments

-1

I network I had to just copy request in curl and then convert it into python code using https://curlconverter.com/.

1 Comment

How does this help with dynamic content?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.