3

For this project, I am scraping data from a database and attempting to export this data to a spreadsheet for further analysis. (Previously posted here--thanks for the help over there reworking my code!)

I previously thought that finding the winning candidate in the table could be simplified by just always selecting the first name that appears in the table, as I thought the "winners" always appeared first. However, this is not the case.

Whether or not a candidate was elected is stored in the form of a picture in the first column. How would I scrape this and store it in a spreadsheet?

It's located under < td headers > as:

<img src="/WPAPPS/WPR/Content/Images/selected_box.gif" alt="contestant won this nomination contest">

My question is: how would I use BeautifulSoup to parse the HTML table and extract a value from the first column, which is stored in the table as an image rather than text.

I had an idea for attempting some sort of Boolean sorting measure, but I am unsure of how to implement.

My code is below:

from bs4 import BeautifulSoup
import requests
import re
import csv


url = "http://www.elections.ca/WPAPPS/WPR/EN/NC?province=-1&distyear=2013&district=-1&party=-1&pageno={}&totalpages=55&totalcount=1368&secondaryaction=prev25"
rows = []

for i in range(1, 56):
    print(i)
    r  = requests.get(url.format(i))
    data = r.text
    cat = BeautifulSoup(data, "html.parser")
    links = []

    for link in cat.find_all('a', href=re.compile('selectedid=')):
        links.append("http://www.elections.ca" + link.get('href'))  

    for link in links:
        r  = requests.get(link)
        data = r.text
        cat = BeautifulSoup(data, "html.parser")
        lspans = cat.find_all('span')
        cs = cat.find_all("table")[0].find_all("td", headers="name/1")        
        elected = []

        for c in cs:
            elected.append(c.contents[0].strip())

        rows.append([
            lspans[2].contents[0], 
            lspans[3].contents[0], 
            lspans[5].contents[0],
            re.sub("[\n\r/]", "", cat.find("legend").contents[2]).strip(),
            re.sub("[\n\r/]", "",  cat.find_all('div', class_="group")[2].contents[2]).strip().encode('latin-1'),
            len(elected),
            cs[0].contents[0].strip().encode('latin-1')
            ])

with open('filename.csv', 'w', newline='') as f_output:
   csv_output = csv.writer(f_output)
   csv_output.writerows(rows)

Really--any tips would be GREATLY appreciated. Thanks a lot.

4
  • What is your question? Commented Sep 29, 2016 at 15:40
  • @Rafael I clarified the question in the post. I've reproduced it here: How would I use BeautifulSoup to parse the HTML table and extract a value from the first column, which is stored in the table as an image rather than text? Commented Sep 29, 2016 at 15:55
  • We need to see the table, the url provided in your code reproduces this error on the page ERROR: Search criteria is invalid. Please try selecting a new search criteria. Commented Sep 29, 2016 at 16:03
  • The url in the code is modified with a pair of curly brackets so that it can loop through all 56 pages. Here is an example of one of the tables. The first column is the one concerned. Commented Sep 29, 2016 at 17:01

1 Answer 1

2

This snippet will print the name of the elected person:

from bs4 import BeautifulSoup
import requests
req  = requests.get("http://www.elections.ca/WPAPPS/WPR/EN/NC/Details?province=-1&distyear=2013&district=-1&party=-1&selectedid=8548")
page_source = BeautifulSoup(req.text, "html.parser")
table = page_source.find("table",{"id":"gvContestants/1"})
for row in table.find_all("tr"):
    if not row.find("img"):
        continue
    if "selected_box.gif" in row.find("img").get("src"):
        print(''.join(row.find("td",{"headers":"name/1"}).text.split()))

As a side note please refrain yourself from declaring variables with meaningless names. It hurts the eyes of anyone trying to help you and it will hurt you in the future when looking at the code again

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.