I am trying to scrape a web-page to list out the jobs posted in URL: https://careers.microsoft.com/us/en/search-results?rk=l-hyderabad
Refer to image for details of web-page inspect Web inspect
Following is observed through a web-page inspect:
Each job listed, is in a HTML li with class="jobs-list-item". The Li contains following html tag & data in parent Div within li
data-ph-at-job-title-text="Software Engineer II", data-ph-at-job-category-text="Engineering", data-ph-at-job-post-date-text="2018-03-19T16:33:00".
1st Child Div within parent Div with class="information" has HTML with url href="https://careers.microsoft.com/us/en/job/406138/Software-Engineer-II"
- 3rd child Div with class="description au-target" within parent Div has short job description
My requirement is to extract below information for each job
- Job Title
- Job Category
- Job Post Date
- Job Post Time
- Job URL
- Job Short Description
I have tried following Python code to scrape the webpage, but unable to extract the required information. (Please ignore the indentation shown in code below)
import requests
from bs4 import BeautifulSoup
def ms_jobs():
url = 'https://careers.microsoft.com/us/en/search-results?rk=l-hyderabad'
resp = requests.get(url)
if resp.status_code == 200:
print("Successfully opened the web page")
soup = BeautifulSoup(resp.text, 'html.parser')
print(soup)
else:
print("Error")
ms_jobs()
seleniumto extract the required data from that page because they are generated dynamically.