Python Scraping .aspx form

Question

I am new to python, trying to do some scraping through an .aspx form. When I execute this code, I get an error. Im using Python 3.4.2.

 import urllib
 from bs4 import BeautifulSoup
 import urllib.request
 from urllib.request import urlopen

 headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Origin': 'http://www.indiapost.gov.in',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'http://www.indiapost.gov.in/pin/',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
 }

 class MyOpener(urllib.request.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'


 myopener = MyOpener()
 url = 'http://legistar.council.nyc.gov/Legislation.aspx'
 # first HTTP request without form data
 f = myopener.open(url)
 soup = BeautifulSoup(f)

 #vstate = soup.select("#__VSTATE")[0]['value']
 viewstate = soup.select("#__VIEWSTATE")[0]['value']
 eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']

 formFields = (
    (r'__VSTATE', r''),
    (r'__VIEWSTATE', viewstate),
    (r'__EVENTVALIDATION', eventvalidation),
    (r'ctl00_RadScriptManager1_HiddenField', ''), 
    (r'ctl00_tabTop_ClientState', ''), 
    (r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
    (r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
    (r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'),  # file number
    (r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'),  # Legislative text
    (r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'),  # attachement
    (r'ctl00$ContentPlaceHolder1$txtSearch', 'york'),   # Search text
    (r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'),  # Years to include
    (r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'),  #types to include
    (r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation')  # Search button itself
 )

encodedFields = urllib.parse.urlencode(formFields)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'wb')
 except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()

This script returns no results.

How do I make the script search the form and return the results?

Why do you have ;\ at the end of each line? It's not necessary, and could lead to unexpected results. — MattDMo
– MattDMo, Commented Dec 1, 2014 at 15:46
I was using python shell, i downloaded PyCharm, so I see that it is not necessary — DJ Howarth
– DJ Howarth, Commented Dec 1, 2014 at 15:50

Community · Accepted Answer · 2017-05-23 12:27:28Z

0

As Andrei mentioned in the comments, you're going to need to import urllib, but you're probably going to have other problems with your code because you're hardcoding __VIEWSTATE and __EVENTVALIDATION.

Hui Zheng did a good job explaining this, which helped me figure it out, so I'll just link to his answer rather than try to paraphrase it.

edited May 23, 2017 at 12:27

CommunityBot

11 silver badge

answered Dec 1, 2014 at 16:06

jgysland

3352 silver badges10 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

DJ Howarth Over a year ago

I got this code from here: stackoverflow.com/questions/1480356/… I have no clue what __VIEWSTATE & __EVENTVALIDATION actually do, I am currently researching it

Collectives™ on Stack Overflow

Python Scraping .aspx form

1 Answer 1

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related