If you want to scrap something, it will be nice first to install a web debugger ( Firebug for Mozilla Firefox for example) to watch how the website you want to scrap is working.
Next, you need to copy the process of how the website is connecting to backoffice
As you said, the content that you want to scrap is being loaded asynchronously (only when the document is ready)
Assuming the debugger is running and also you have refreshed the page, you will see on the network tab the following request:
POST https://seahawks.strmarketplace.com/Charter-Seat-Licenses/Charter-Seat-Licenses.aspx
The final process flow to reach your goal will be:
- 1/ Use requests python module
- 2/ Open a requests session to the index page website site (with cookies handling)
- 3/ Scrap all the input for the specific POST form request
- 4/ Build a POST parameter DICT containing all inputs & value fields scrapped in the previous step + adding some specific fixed params.
- 5/ POST the request (with required data)
- 6/ Use finally BS4 module (as usual) to soup the answered html to scrap your data
Please see bellow a working code:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
from bs4 import BeautifulSoup
import requests
base_url="https://seahawks.strmarketplace.com/Charter-Seat-Licenses/Charter-Seat-Licenses.aspx"
#create requests session
s = requests.session()
#get index page
r=s.get(base_url)
#soup page
bs=BeautifulSoup(r.text)
#extract FORM html
form_soup= bs.find('form',{'name':'aspnetForm'})
#extracting all inputs
input_div = form_soup.findAll("input")
#build the data parameters for POST request
#we add some required <fixed> data parameters for post
data={
'__EVENTARGUMENT':'LISTINGS;0',
'__EVENTTARGET':'ctl00$ContentPlaceHolder$ctl00$ctl00$RadAjaxPanel_GV',
'__EVENTVALIDATION':'/wEWGwKis6fzCQLDnJnSDwLq4+CbDwK9jryHBQLrmcucCgL56enHAwLRrPHhCgKDk6P+CwL1/aWtDQLm0q+gCALRvI2QDAKch7HjBAKWqJHWBAKil5XsDQK58IbPAwLO3dKwCwL6uJOtBgLYnd3qBgKyp7zmBAKQyTBQK9qYAXAoieq54JAuG/rDkC1djKyQMC1qnUtgoC0OjaygUCv4b7sAhfkEODRvsa3noPfz2kMsxhAwlX3Q=='
}
#we add some <dynamic> data parameters
for input_d in input_div:
try:
data[ input_d['name'] ] =input_d['value']
except:
pass #skip unused input field
#post request
r2=s.post(base_url,data=data)
#write the result
with open("post_result.html","w") as f:
f.write(r2.text.encode('utf8'))
Now, please get a look at "post_result.html" content and you will find the data !
Regards