Python / Pandas Dataframe: Automatically fill in missing rows

Question

My goal is to ultimately create a scatter plot with date on the x-axis and won delegates (of each candidate) on the y-axis. I'm unsure of how to "fill in the blanks" when it comes to missing dates. I've attached a picture of the table I get.

For example, I'm trying to put March 1 as the date for Alaska, Arkansas, etc. to make it possible to plot the data.

# CREATE DATAFRAME WITH DELEGATE WON/TARGET INFORMATION

import requests 
from lxml import html 
import pandas 

url = "http://projects.fivethirtyeight.com/election-2016/delegate-targets/"
response = requests.get(url)
doc = html.fromstring(response.text)

tables = doc.findall('.//table[@class="delegates desktop"]')
election = tables[0] 
election_rows = election.findall('.//tr')
def extractCells(row, isHeader=False):
    if isHeader:
        cells = row.findall('.//th')
    else:
        cells = row.findall('.//td')
    return [val.text_content() for val in cells]


def parse_options_data(table):

    rows = table.findall(".//tr")
    header = extractCells(rows[1], isHeader=True)
    data = [extractCells(row, isHeader=False) for row in rows[2:]]

    trumpdata = "Trump Won Delegates"
    cruzdata = "Cruz Won Delegates"
    kasichdata = "Kasich Won Delegates"

    data = pandas.DataFrame(data, columns=["Date", "State or Territory", "Total Delegates", trumpdata, cruzdata, kasichdata, "Rubio"])

    data.insert(4, "Trump Target Delegates", data[trumpdata].str.extract(r'(\d{0,3}$)'))
    data.insert(6, "Cruz Target Delegates", data[cruzdata].str.extract(r'(\d{0,3}$)'))
    data.insert(8, "Kasich Target Delegates", data[kasichdata].str.extract(r'(\d{0,3}$)'))

    data = data.drop('Rubio', 1)
    data[trumpdata] = data[trumpdata].str.extract(r'(^\d{0,3})')
    data[cruzdata] = data[cruzdata].str.extract(r'(^\d{0,3})')
    data[kasichdata] = data[kasichdata].str.extract(r'(^\d{0,3})')

    return df

election_data = parse_options_data(election)
df = pandas.DataFrame(election_data)
df

Picture of my table

Just for clarification, how do you know what values to fill the blanks with? That is, how would you know that "March 1" is the correct value to put into the date fields for Alaska, Arkansas, etc.? Or, would any date do, so long as it is not blank (and perhaps not less than existing values)? — David
– David, Commented Mar 29, 2016 at 20:19
Hi! Ideally I would use forward fill (?) to fill all the blanks after March 1 (but before the next date) with March 1. So there'd be 7 rows of March 1 then 5 rows of March 12 for example, instead of just one row of each date. Hope that makes sense & thanks so much! — Lucy
– Lucy, Commented Mar 30, 2016 at 2:43

Sagar Waghmode · Accepted Answer · 2016-03-29 18:16:23Z

1

You could do,

 data.fillna('March 1')

I would advise you to go through the documentation

http://pandas.pydata.org/pandas-docs/stable/10min.html

answered Mar 29, 2016 at 18:16

Sagar Waghmode

7775 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python / Pandas Dataframe: Automatically fill in missing rows

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related