8

I am new the web-scraping game. I am trying to scrap the following website: http://www.foodemissions.com/foodemissions/Calculator.aspx

Using resources found on the Internet, I put together the following HTTP POST request:

import urllib
from bs4 import BeautifulSoup

headers = {
    'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
    'Content-Type': 'application/x-www-form-urlencoded',
    'Accept-Encoding': 'gzip,deflate,sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
}

class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
url = 'http://www.foodemissions.com/foodemissions/Calculator.aspx'
# first HTTP request without form data
f = myopener.open(url)
soup_dummy = BeautifulSoup(f,"html5lib")
# parse and retrieve two vital form values
viewstate = soup_dummy.select("#__VIEWSTATE")[0]['value']
viewstategen = soup_dummy.select("#__VIEWSTATEGENERATOR")[0]['value']

soup_dummy.find(id="ctl00_MainContent_category")

#search for the string 'input' to find the form data
formData = (
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEGENERATOR', viewstategen),
    ('ctl00$MainContent$transport', '200'),
    ('ctl00$MainContent$quantity','1'),
    ('ctl00$MainContent$wastepct','100')
)

encodedFields = urllib.urlencode(formData)
# second HTTP request with form data
f = myopener.open(url, encodedFields)
soup = BeautifulSoup(f,"html5lib")
trans_emissions = soup.find("span", id="ctl00_MainContent_transEmissions")
print(trans_emissions.text)

The output from my final print command doesn't seem to change even when I change the ctl00$MainContent$transport element. Any pointers on why this is the case?

Thanks!

3
  • I don't know much about BeautifulSoup, but are you making a post or a get? Commented Sep 24, 2017 at 22:17
  • I am trying to make a POST Commented Sep 24, 2017 at 22:18
  • As an aside, make sure you encode your fields: encodedFields = encodedFields.encode('ascii'), otherwise it'll throw a type error when you try to POST. Commented Apr 11, 2019 at 16:16

1 Answer 1

4

You need to make the ASP.NET App "think" that you clicked the calculate button by adding the button name to the __EVENTTARGET hidden input.

formData = (
    ('__VIEWSTATE', viewstate),
    ('__VIEWSTATEGENERATOR', viewstategen),
    ('ctl00$MainContent$transport', '100'),
    ('ctl00$MainContent$quantity','150'),
    ('ctl00$MainContent$wastepct','200'),
    ('__EVENTTARGET', 'ctl00$MainContent$calculate')
)
Sign up to request clarification or add additional context in comments.

5 Comments

Worked like a charm! Thanks @kblok.
Another quick question; how did you know __EVENTTARGET was associated with ct100$MainContent$calculate? I don't see any connection between them in the page source.
@varun because that’s part of the ASP.NET internals. In order to implement server control events, like a calculate.click, ASP.NET needs to know somehow which is the control performing a post. That’s being solved using the __EVENTTARGET and the __EVENTARGUMENTS (if needed) but the click has no arguments.
@hardkoded Thanks for this answer. I am working on something similar now, however this code does not work, it says: AttributeError: 'NoneType' object has no attribute 'tex
@hardkoded could you take a look at this problem, stackoverflow.com/questions/71165790/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.