For this project, I am scraping data from a database and attempting to export this data to a spreadsheet for further analysis. (Previously posted here--thanks for the help over there reworking my code!)
I previously thought that finding the winning candidate in the table could be simplified by just always selecting the first name that appears in the table, as I thought the "winners" always appeared first. However, this is not the case.
Whether or not a candidate was elected is stored in the form of a picture in the first column. How would I scrape this and store it in a spreadsheet?
It's located under < td headers > as:
<img src="/WPAPPS/WPR/Content/Images/selected_box.gif" alt="contestant won this nomination contest">
My question is: how would I use BeautifulSoup to parse the HTML table and extract a value from the first column, which is stored in the table as an image rather than text.
I had an idea for attempting some sort of Boolean sorting measure, but I am unsure of how to implement.
My code is below:
from bs4 import BeautifulSoup
import requests
import re
import csv
url = "http://www.elections.ca/WPAPPS/WPR/EN/NC?province=-1&distyear=2013&district=-1&party=-1&pageno={}&totalpages=55&totalcount=1368&secondaryaction=prev25"
rows = []
for i in range(1, 56):
print(i)
r = requests.get(url.format(i))
data = r.text
cat = BeautifulSoup(data, "html.parser")
links = []
for link in cat.find_all('a', href=re.compile('selectedid=')):
links.append("http://www.elections.ca" + link.get('href'))
for link in links:
r = requests.get(link)
data = r.text
cat = BeautifulSoup(data, "html.parser")
lspans = cat.find_all('span')
cs = cat.find_all("table")[0].find_all("td", headers="name/1")
elected = []
for c in cs:
elected.append(c.contents[0].strip())
rows.append([
lspans[2].contents[0],
lspans[3].contents[0],
lspans[5].contents[0],
re.sub("[\n\r/]", "", cat.find("legend").contents[2]).strip(),
re.sub("[\n\r/]", "", cat.find_all('div', class_="group")[2].contents[2]).strip().encode('latin-1'),
len(elected),
cs[0].contents[0].strip().encode('latin-1')
])
with open('filename.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(rows)
Really--any tips would be GREATLY appreciated. Thanks a lot.
ERROR: Search criteria is invalid. Please try selecting a new search criteria.