I'm trying to scrape data from a multi-page table that is returned after filling out a form. The URL of the original form in question is https://ndber.seai.ie/Pass/assessors/search.aspx
From https://kaijento.github.io/2017/05/04/web-scraping-requests-eventtarget-viewstate/ I get the code that extracts the hidden variables from the blank form that are then sent with the POST request to get the data
import requests
from bs4 import BeautifulSoup
url='https://ndber.seai.ie/PASS/Assessors/Search.aspx'
with requests.session() as s:
s.headers['user-agent'] = 'Mozilla/5.0'
r = s.get(url)
soup = BeautifulSoup(r.content, 'html5lib')
target = 'ctl00$DefaultContent$AssessorSearch$gridAssessors$grid_pager'
# unsupported CSS Selector 'input[name^=ctl00][value]'
data = { tag['name']: tag['value']
for tag in soup.select('input[name^=ctl00]') if tag.get('value')
}
state = { tag['name']: tag['value']
for tag in soup.select('input[name^=__]')
}
data.update(state)
data['__EVENTTARGET'] = ''
data['__EVENTARGUMENT'] = ''
print(data)
r = s.post(url, data=data)
new_soup = BeautifulSoup(r.content, 'html5lib')
print(new_soup)
The initial .get goes fine, I get the html for the blank form, and I can extract the parameters into data.
However the .post returns a html page that indicates an error has occurred with no useful data.
Note that the results are split over multiple pages and when you go from page to page the following parameters are given values
data['__EVENTTARGET'] = 'ctl00$DefaultContent$AssessorSearch$gridAssessors$grid_pager'
data['__EVENTARGUMENT'] = '1$n' # where n is the number of the age to retrieve
In the code above I'm initially just trying to get the first page of results and then once that's working I'll work out the loop to go through all the results and join them.
Does anyone have an idea of how to handle such as case ?
Thanks / Colm
ctl00_DefaultContent_AssessorSearch_captchawithin data parameters in order to send the same with post requests to fetch the required content. Turn out that the value of that aforementioned key is dynamic and I highly doubt you can find it in page source.