Adding a column to pandas dataframe conditionally

Question

I am working on a personal project collecting the data on Covid-19 cases. The data set only shows the total number of Covid-19 cases per state cumulatively. I would like to add a column that contains the new cases added that day. This is what I have so far:

import pandas as pd
from datetime import date
from datetime import timedelta
import numpy as np

#read the CSV from github
hist_US_State = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")

#some code to get yesterday's date and the day before which is needed later.
today = date.today()
yesterday = today - timedelta(days = 1)
yesterday = str(yesterday)
day_before_yesterday = today - timedelta(days = 2)
day_before_yesterday = str(day_before_yesterday)

#Extracting yesterday's and the day before cases and combine them in one dataframe
yesterday_cases = hist_US_State[hist_US_State["date"] == yesterday]
day_before_yesterday_cases = hist_US_State[hist_US_State["date"] == day_before_yesterday]

total_cases = pd.DataFrame()
total_cases = day_before_yesterday_cases.append(yesterday_cases)

#Adding a new column called "new_cases" and this is where I get into trouble.
total_cases["new_cases"] = yesterday_cases["cases"] - day_before_yesterday_cases["cases"]

Can you please point out what I am doing wrong?

I don't end up with the new column that contains yesterday's cases of one state - the day before yesterday's cases of one state. My end goal is to add a column that contains the number of covid-19 cases added that day. — David G.
– David G., Commented Aug 19, 2020 at 21:11

Cal Lee · Accepted Answer · 2020-08-19 21:07:04Z

1

Because you defined total_cases as a concatenation (via append) of yesterday_cases and day_before_yesterday_cases, its number of rows is equal to the sum of the other two dataframes. It looks like yesterday_cases and day_before_yesterday_cases both have 55 rows, and so total_cases has 110 rows. Thus your last line is trying to assign 55 values to a series of 110 values.

You may either want to reshape your data so that each date is its own column, or work in arrays of dataframes.

answered Aug 19, 2020 at 21:07

Cal Lee

826 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

David G. Over a year ago

Thanks for your response! I am not sure if you completely understand me. I am trying to add a new column when the state matches and fill the new column with the yesterday's cases of one state - the day before yesterday's cases of one state. Does that make sense?

Cal Lee Over a year ago

Yes so you need to reshape the data. There are some ways to do this. Here's on implementation using pivot_table. daily_cases = total_cases.pivot_table(index = ['state', 'fips'], columns = 'date', values = 'cases') daily_cases['difference'] = daily_cases[yesterday] - daily_cases[day_before_yesterday]

David G. Over a year ago

Awesome! Thank you so much for that explanation

Collectives™ on Stack Overflow

Adding a column to pandas dataframe conditionally

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related