0

I am trying to input a decision start and end date into 2 input boxes on the Gosport Council website by sending a post request. Whenever I print out the text received from after I send the request it gives me the info shown on the input page, not the loaded page

import requests

payload = {
    "applicationDecisionStart": "1/8/2018",
    "applicationDecisionEnd": "1/10/2018",
}

with requests.Session() as session:
    r = session.get("https://publicaccess.gosport.gov.uk/online-applications/search.do?action=advanced", timeout=10, data=payload)

    print(r.text)

If I execute it I want it to print out the HTML with the href links for example <a href="/online-applications/applicationDetails.do?keyVal=PEA12JHO07E00&amp;activeTab=summary"> But my code won't show anything like this

0

2 Answers 2

2

I observe the POST, not GET which you are doing, is as follows (ignoring empty fields in POST):

from bs4 import BeautifulSoup as bs
import requests

payload = {
    'caseAddressType':'Application'
    ,'date(applicationDecisionStart)' :'1/8/2018'
    ,'date(applicationDecisionEnd)': '1/10/2018'
    , 'searchType' : 'Application'
}

with requests.Session() as s:
    r = s.post('https://publicaccess.gosport.gov.uk/online-applications/advancedSearchResults.do?action=firstPage', data = payload)
    soup = bs(r.content, 'lxml')
    info = [(item.text.strip(), item['href']) for item in soup.select('#searchresults a')]
    print(info)
    ## later pages
    #https://publicaccess.gosport.gov.uk/online-applications/pagedSearchResults.do?action=page&searchCriteria.page=2

Loop over pages:

from bs4 import BeautifulSoup as bs
import requests

payload = {
    'caseAddressType':'Application'
    ,'date(applicationDecisionStart)' :'1/8/2018'
    ,'date(applicationDecisionEnd)': '1/10/2018'
    , 'searchType' : 'Application'
}

with requests.Session() as s:
    r = s.post('https://publicaccess.gosport.gov.uk/online-applications/advancedSearchResults.do?action=firstPage', data = payload)
    soup = bs(r.content, 'lxml')
    info = [(item.text.strip(), item['href']) for item in soup.select('#searchresults a')]
    print(info)
    pages = int(soup.select('span + a.page')[-1].text)

    for page in range(2, pages + 1):
        r = s.get('https://publicaccess.gosport.gov.uk/online-applications/pagedSearchResults.do?action=page&searchCriteria.page={}'.format(page))
        soup = bs(r.content, 'lxml')
        info = [(item.text.strip(), item['href']) for item in soup.select('#searchresults a')]
        print(info)       
Sign up to request clarification or add additional context in comments.

3 Comments

Is there a way to make it so I can use the ID names for inputs instead though? It's fine if I can't
probably. Have you used dev tools to look at the POST activity when you submit an id?
Just checked what's being parsed... Doesn't seem like the ID itself is being parsed at all, only the name. This is fine though as it works :) thank you
0

the url and data is incorrect

use Chrome to analysis the response

press f12 to open Developer tools,change to item "network".then submit your page,analysis the first request initiated by Chrome.

what you need:

  1. Hearders-general-request url
  2. Hearders-request headers
  3. Hearders-data

you need some packages to parser th html, such as bs4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.