In this SO question the OP is unable to scrape a table from a dynamically loaded website. In monitoring the web traffic, via Chrome dev tools, I found that there is an API request made that returns a JSON string with the required info.
The following is the answer I wrote to extract the columns of interest from the API response.
Guide to columns of interest:
Course Title=titleTrainer=name(withintrainers)Rating=ratingVendor=name(withinvendors)IT Path=path_label(withinpaths)Skill Level=display(withindifficulty)Course URL= concatenation ofbasewithseoslug
The Vendors field has missing items hence my use of an if statement in the assignment to vendors. I am not sure what the usual placeholder value is for missing string values in Python.
I use repeated list comprehensions in loops over the JSON object data; where data = response.json()
I couldn't think of a way to remove these repeated loops and still have legible code.
I generate a dataframe by joining the lists in a dictionary and then converting with pandas.
I welcome any and all feedback please.
JSON response:
Example JSON dictionary within response. The response has a collection of such dictionaries.
Python 3
import requests
import pandas as pd
def main():
base = 'https://www.cbtnuggets.com/it-training/'
response = requests.get('https://api.cbtnuggets.com/site-gateway/v1/all/courses/for/search?archive=false')
data = response.json()
titles = [item['title'] for item in data]
trainers = [item['trainers'][0]['name'] for item in data]
ratings = [item['rating'] for item in data]
vendors = [item['vendors'][0]['display'] if len(item['vendors']) != 0 else 'N/A' for item in data]
paths = [item['paths'][0]['path_label'] for item in data]
skillLevel = [item['difficulty']['display'] for item in data]
links = [base + item['seoslug'] for item in data]
df= pd.DataFrame(
{'Course Title': titles,
'Trainer': trainers,
'Rating': ratings,
'Vendor': vendors,
'IT Path': paths,
'Skill Level': skillLevel,
'Course URL': links
})
#print(df)
df.to_csv(r'C:\Users\User\Desktop\Data.csv', sep=',', encoding='utf-8',index = False )
if __name__ == "__main__":
main()