Scraping: add data stored as a picture to CSV file in python 3.5

Question

For this project, I am scraping data from a database and attempting to export this data to a spreadsheet for further analysis. (Previously posted here--thanks for the help over there reworking my code!)

I previously thought that finding the winning candidate in the table could be simplified by just always selecting the first name that appears in the table, as I thought the "winners" always appeared first. However, this is not the case.

Whether or not a candidate was elected is stored in the form of a picture in the first column. How would I scrape this and store it in a spreadsheet?

It's located under < td headers > as:

<img src="/WPAPPS/WPR/Content/Images/selected_box.gif" alt="contestant won this nomination contest">

My question is: how would I use BeautifulSoup to parse the HTML table and extract a value from the first column, which is stored in the table as an image rather than text.

I had an idea for attempting some sort of Boolean sorting measure, but I am unsure of how to implement.

My code is below:

from bs4 import BeautifulSoup
import requests
import re
import csv


url = "http://www.elections.ca/WPAPPS/WPR/EN/NC?province=-1&distyear=2013&district=-1&party=-1&pageno={}&totalpages=55&totalcount=1368&secondaryaction=prev25"
rows = []

for i in range(1, 56):
    print(i)
    r  = requests.get(url.format(i))
    data = r.text
    cat = BeautifulSoup(data, "html.parser")
    links = []

    for link in cat.find_all('a', href=re.compile('selectedid=')):
        links.append("http://www.elections.ca" + link.get('href'))  

    for link in links:
        r  = requests.get(link)
        data = r.text
        cat = BeautifulSoup(data, "html.parser")
        lspans = cat.find_all('span')
        cs = cat.find_all("table")[0].find_all("td", headers="name/1")        
        elected = []

        for c in cs:
            elected.append(c.contents[0].strip())

        rows.append([
            lspans[2].contents[0], 
            lspans[3].contents[0], 
            lspans[5].contents[0],
            re.sub("[\n\r/]", "", cat.find("legend").contents[2]).strip(),
            re.sub("[\n\r/]", "",  cat.find_all('div', class_="group")[2].contents[2]).strip().encode('latin-1'),
            len(elected),
            cs[0].contents[0].strip().encode('latin-1')
            ])

with open('filename.csv', 'w', newline='') as f_output:
   csv_output = csv.writer(f_output)
   csv_output.writerows(rows)

Really--any tips would be GREATLY appreciated. Thanks a lot.

@Rafael I clarified the question in the post. I've reproduced it here: How would I use BeautifulSoup to parse the HTML table and extract a value from the first column, which is stored in the table as an image rather than text? — HowenWilson
– HowenWilson, Commented Sep 29, 2016 at 15:55
We need to see the table, the url provided in your code reproduces this error on the page ERROR: Search criteria is invalid. Please try selecting a new search criteria. — Rafael Almeida
– Rafael Almeida, Commented Sep 29, 2016 at 16:03
The url in the code is modified with a pair of curly brackets so that it can loop through all 56 pages. Here is an example of one of the tables. The first column is the one concerned. — HowenWilson
– HowenWilson, Commented Sep 29, 2016 at 17:01

Rafael Almeida · Accepted Answer · 2016-09-29 19:54:28Z

2

This snippet will print the name of the elected person:

from bs4 import BeautifulSoup
import requests
req  = requests.get("http://www.elections.ca/WPAPPS/WPR/EN/NC/Details?province=-1&distyear=2013&district=-1&party=-1&selectedid=8548")
page_source = BeautifulSoup(req.text, "html.parser")
table = page_source.find("table",{"id":"gvContestants/1"})
for row in table.find_all("tr"):
    if not row.find("img"):
        continue
    if "selected_box.gif" in row.find("img").get("src"):
        print(''.join(row.find("td",{"headers":"name/1"}).text.split()))

As a side note please refrain yourself from declaring variables with meaningless names. It hurts the eyes of anyone trying to help you and it will hurt you in the future when looking at the code again

edited Sep 29, 2016 at 19:54

answered Sep 29, 2016 at 19:28

Rafael Almeida

5,2602 gold badges23 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Scraping: add data stored as a picture to CSV file in python 3.5

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related