0

I'm having trouble getting a script to work and I suspect there is something simple I'm overlooking. I've pasted a simplified script below that generates the same type of error for a similar data set.

The script below is meant to fetch a csv in which each row represents poll data for a given state. Using a list of states, I'd like to iterate over the csv data to find the latest poll for each state and generate a list of lists that summarizes one of the attributes for that state (the percent voting for the Democratic candidate in this example). I need to allow for the possibility that the "rows" in the csv file may not be in any particular order and that some "states" may not have data in the csv.

This sample script generates the right output for the first state ("Alabama"), but it fails to find data for any of the other states in the states list. Why?

Note 1 - the script does fetch the csv file Note 2 - the script works as expected if, instead of fetching the csv file, I provide the poll data as a list of lists

Thanks for your help.

import csv, httplib2, cStringIO

h = httplib2.Http('.cache')
url = 'http://www.electoral-vote.com/evp2012/Pres/pres_polls.csv'
headers, data = h.request(url)

states = [
            "Alabama", 
            "Alaska", 
            "Arizona", 
            "Arkansas", 
            "California", 
            "Colorado", 
            "Connecticut",
            "Delaware",
            "Florida",
            "Georgia",
            "Hawaii",
            "Idaho",
            "Illinois",
            "Indiana",
            "Iowa",
            "Kansas",
            "Kentucky",
            "Louisiana",
            "Maine",
            "Maryland",
            "Massachusetts",
            "Michigan",
            "Minnesota",
            "Mississippi",
            "Missouri",
            "Montana",
            "Nebraska",
            "Nevada",
            "New Hampshire",
            "New Jersey",
            "New Mexico",
            "New York",
            "North Carolina",
            "North Dakota",
            "Ohio",
            "Oklahoma",
            "Oregon",
            "Pennsylvania",
            "Rhode Island",
            "South Carolina",
            "South Dakota",
            "Tennessee",
            "Texas",
            "Utah",
            "Vermont",
            "Virginia",
            "Washington",
            "West Virginia",
            "Wisconsin",
            "Wyoming"
            ]

csv_input = cStringIO.StringIO(data)
csv_output = csv.reader(csv_input)

# sample row => 
#['Day', 'Len', 'State', 'EV', 'Dem', 'GOP', 'Ind', 'Date', '', '', '', '', '', '', '', 'Pollster']
#['  1.0', '1', 'Wyoming', '3', '33', '65', '', 'Jan 01', '', '', '', '', '', '', '', 'Election 2008-1'] 

percent_dem_by_state = []

for state in states:
    poll_day = 0
    percent_dem_for_this_state = [state, None]
    for row in csv_output:
        if (state == row[2]) and (float(row[0]) > poll_day):
            percent_dem_for_this_state = [state, int(row[4])]
            poll_day = float(row[0])
    percent_dem_by_state.append(percent_dem_for_this_state)

for elem in percent_dem_by_state:
    print elem

1 Answer 1

1

Your loop which reads the CSV file "uses it up". It is not reset each time through your outer loop.

A better strategy anyway is to read through the CSV file once, before doing anything else, loading the data into memory and looping over that. Your current intended logic of reading through the entire CSV file for each state, just picking out one state's worth of data each time, is going to be way, way slower than it has to be (approximately 50 times slower).

As for the data structure in memory, you have already seen that a list of lists works. You would be even better served by a dictionary, where the keys are the states. Then you don't have to loop over the whole thing for each state.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.