0

My goal is to ultimately create a scatter plot with date on the x-axis and won delegates (of each candidate) on the y-axis. I'm unsure of how to "fill in the blanks" when it comes to missing dates. I've attached a picture of the table I get.

For example, I'm trying to put March 1 as the date for Alaska, Arkansas, etc. to make it possible to plot the data.

# CREATE DATAFRAME WITH DELEGATE WON/TARGET INFORMATION

import requests 
from lxml import html 
import pandas 

url = "http://projects.fivethirtyeight.com/election-2016/delegate-targets/"
response = requests.get(url)
doc = html.fromstring(response.text)

tables = doc.findall('.//table[@class="delegates desktop"]')
election = tables[0] 
election_rows = election.findall('.//tr')
def extractCells(row, isHeader=False):
    if isHeader:
        cells = row.findall('.//th')
    else:
        cells = row.findall('.//td')
    return [val.text_content() for val in cells]


def parse_options_data(table):

    rows = table.findall(".//tr")
    header = extractCells(rows[1], isHeader=True)
    data = [extractCells(row, isHeader=False) for row in rows[2:]]

    trumpdata = "Trump Won Delegates"
    cruzdata = "Cruz Won Delegates"
    kasichdata = "Kasich Won Delegates"

    data = pandas.DataFrame(data, columns=["Date", "State or Territory", "Total Delegates", trumpdata, cruzdata, kasichdata, "Rubio"])

    data.insert(4, "Trump Target Delegates", data[trumpdata].str.extract(r'(\d{0,3}$)'))
    data.insert(6, "Cruz Target Delegates", data[cruzdata].str.extract(r'(\d{0,3}$)'))
    data.insert(8, "Kasich Target Delegates", data[kasichdata].str.extract(r'(\d{0,3}$)'))

    data = data.drop('Rubio', 1)
    data[trumpdata] = data[trumpdata].str.extract(r'(^\d{0,3})')
    data[cruzdata] = data[cruzdata].str.extract(r'(^\d{0,3})')
    data[kasichdata] = data[kasichdata].str.extract(r'(^\d{0,3})')

    return df

election_data = parse_options_data(election)
df = pandas.DataFrame(election_data)
df

Picture of my table

2
  • Just for clarification, how do you know what values to fill the blanks with? That is, how would you know that "March 1" is the correct value to put into the date fields for Alaska, Arkansas, etc.? Or, would any date do, so long as it is not blank (and perhaps not less than existing values)? Commented Mar 29, 2016 at 20:19
  • Hi! Ideally I would use forward fill (?) to fill all the blanks after March 1 (but before the next date) with March 1. So there'd be 7 rows of March 1 then 5 rows of March 12 for example, instead of just one row of each date. Hope that makes sense & thanks so much! Commented Mar 30, 2016 at 2:43

1 Answer 1

1

You could do,

 data.fillna('March 1')

I would advise you to go through the documentation

http://pandas.pydata.org/pandas-docs/stable/10min.html

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.