0

I am working on a personal project collecting the data on Covid-19 cases. The data set only shows the total number of Covid-19 cases per state cumulatively. I would like to add a column that contains the new cases added that day. This is what I have so far:

import pandas as pd
from datetime import date
from datetime import timedelta
import numpy as np

#read the CSV from github
hist_US_State = pd.read_csv("https://raw.githubusercontent.com/nytimes/covid-19-data/master/us-states.csv")

#some code to get yesterday's date and the day before which is needed later.
today = date.today()
yesterday = today - timedelta(days = 1)
yesterday = str(yesterday)
day_before_yesterday = today - timedelta(days = 2)
day_before_yesterday = str(day_before_yesterday)

#Extracting yesterday's and the day before cases and combine them in one dataframe
yesterday_cases = hist_US_State[hist_US_State["date"] == yesterday]
day_before_yesterday_cases = hist_US_State[hist_US_State["date"] == day_before_yesterday]

total_cases = pd.DataFrame()
total_cases = day_before_yesterday_cases.append(yesterday_cases)

#Adding a new column called "new_cases" and this is where I get into trouble.
total_cases["new_cases"] = yesterday_cases["cases"] - day_before_yesterday_cases["cases"]

Can you please point out what I am doing wrong?

2
  • 1
    What’s the issue? Commented Aug 19, 2020 at 21:03
  • I don't end up with the new column that contains yesterday's cases of one state - the day before yesterday's cases of one state. My end goal is to add a column that contains the number of covid-19 cases added that day. Commented Aug 19, 2020 at 21:11

1 Answer 1

1

Because you defined total_cases as a concatenation (via append) of yesterday_cases and day_before_yesterday_cases, its number of rows is equal to the sum of the other two dataframes. It looks like yesterday_cases and day_before_yesterday_cases both have 55 rows, and so total_cases has 110 rows. Thus your last line is trying to assign 55 values to a series of 110 values.

You may either want to reshape your data so that each date is its own column, or work in arrays of dataframes.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for your response! I am not sure if you completely understand me. I am trying to add a new column when the state matches and fill the new column with the yesterday's cases of one state - the day before yesterday's cases of one state. Does that make sense?
Yes so you need to reshape the data. There are some ways to do this. Here's on implementation using pivot_table. daily_cases = total_cases.pivot_table(index = ['state', 'fips'], columns = 'date', values = 'cases') daily_cases['difference'] = daily_cases[yesterday] - daily_cases[day_before_yesterday]
Awesome! Thank you so much for that explanation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.