0

I am new to python, trying to do some scraping through an .aspx form. When I execute this code, I get an error. Im using Python 3.4.2.

 import urllib
 from bs4 import BeautifulSoup
 import urllib.request
 from urllib.request import urlopen

 headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Origin': 'http://www.indiapost.gov.in',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko)  Chrome/24.0.1312.57 Safari/537.17',
'Content-Type': 'application/x-www-form-urlencoded',
'Referer': 'http://www.indiapost.gov.in/pin/',
'Accept-Encoding': 'gzip,deflate,sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3'
 }

 class MyOpener(urllib.request.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'


 myopener = MyOpener()
 url = 'http://legistar.council.nyc.gov/Legislation.aspx'
 # first HTTP request without form data
 f = myopener.open(url)
 soup = BeautifulSoup(f)

 #vstate = soup.select("#__VSTATE")[0]['value']
 viewstate = soup.select("#__VIEWSTATE")[0]['value']
 eventvalidation = soup.select("#__EVENTVALIDATION")[0]['value']

 formFields = (
    (r'__VSTATE', r''),
    (r'__VIEWSTATE', viewstate),
    (r'__EVENTVALIDATION', eventvalidation),
    (r'ctl00_RadScriptManager1_HiddenField', ''), 
    (r'ctl00_tabTop_ClientState', ''), 
    (r'ctl00_ContentPlaceHolder1_menuMain_ClientState', ''),
    (r'ctl00_ContentPlaceHolder1_gridMain_ClientState', ''),
    (r'ctl00$ContentPlaceHolder1$chkOptions$0', 'on'),  # file number
    (r'ctl00$ContentPlaceHolder1$chkOptions$1', 'on'),  # Legislative text
    (r'ctl00$ContentPlaceHolder1$chkOptions$2', 'on'),  # attachement
    (r'ctl00$ContentPlaceHolder1$txtSearch', 'york'),   # Search text
    (r'ctl00$ContentPlaceHolder1$lstYears', 'All Years'),  # Years to include
    (r'ctl00$ContentPlaceHolder1$lstTypeBasic', 'All Types'),  #types to include
    (r'ctl00$ContentPlaceHolder1$btnSearch', 'Search Legislation')  # Search button itself
 )

encodedFields = urllib.parse.urlencode(formFields)
# second HTTP request with form data
f = myopener.open(url, encodedFields)

try:
# actually we'd better use BeautifulSoup once again to
# retrieve results(instead of writing out the whole HTML file)
# Besides, since the result is split into multipages,
# we need send more HTTP requests
fout = open('tmp.html', 'wb')
 except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()

This script returns no results.

How do I make the script search the form and return the results?

4
  • Youneed import urllib line for that Commented Dec 1, 2014 at 14:56
  • ok, I added it, same issue Commented Dec 1, 2014 at 15:42
  • Why do you have ;\ at the end of each line? It's not necessary, and could lead to unexpected results. Commented Dec 1, 2014 at 15:46
  • I was using python shell, i downloaded PyCharm, so I see that it is not necessary Commented Dec 1, 2014 at 15:50

1 Answer 1

0

As Andrei mentioned in the comments, you're going to need to import urllib, but you're probably going to have other problems with your code because you're hardcoding __VIEWSTATE and __EVENTVALIDATION.

Hui Zheng did a good job explaining this, which helped me figure it out, so I'll just link to his answer rather than try to paraphrase it.

Sign up to request clarification or add additional context in comments.

1 Comment

I got this code from here: stackoverflow.com/questions/1480356/… I have no clue what __VIEWSTATE & __EVENTVALIDATION actually do, I am currently researching it

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.