0

I am trying to get the percentage change in value of today compared to yesterday, for every day in the dataframe. This is the line that throws the error-

import pandas as pd

df = pd.DataFrame({'new_cases':[368060.0,
 357316.0,
 382146.0,
 412431.0,
 414188.0,
 401078.0,
 403405.0,
 366494.0,
 329942.0]})

df['percent_increase_cases'] = df['new_cases'].apply(pd.Series.pct_change)

The formula I am using is

percent_increase = (today's cases - yesterday's cases) / yesterday's cases * 100

It works if I use the code below but I wanted to make it cleaner.

df['percent_increase_cases'] = (df['new_cases'].diff(1)) / df['new_cases'].shift(1) * 100
0

2 Answers 2

1

Looks like what is happening is that apply() applies the function pd.Series.pct_change to each element of the series df['new_cases'], rather than to the series as a whole. For example, if I run

pd.Series.pct_change(df['new_cases'])

then I get this:

0         NaN
1   -0.029191
2    0.069490
3    0.079250
4    0.004260
5   -0.031652
6    0.005802
7   -0.091499
8   -0.099734
Name: new_cases, dtype: float64

However, if I apply pd.Series.pct_change to the first element of df['new_cases'], 368060.0, like this,

pd.Series.pct_change(368060.0)

then I reproduce your error:

Traceback (most recent call last):

  File "<ipython-input-7-7aee3ee9524c>", line 1, in <module>
    pd.Series.pct_change(368060.0)

  File "/usr/local/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 10078, in pct_change
    axis = self._get_axis_number(kwargs.pop("axis", self._stat_axis_name))

AttributeError: 'float' object has no attribute '_get_axis_number'

Looks like the solution, then, is to run either

pd.Series.pct_change(df['new_cases'])*100

or equivalently,

df['new_cases'].pct_change()*100

The factor of 100 is there so you get a percent rather than a decimal fraction, consistent with your original formula, (df['new_cases'].diff(1)) / df['new_cases'].shift(1) * 100.

Sign up to request clarification or add additional context in comments.

1 Comment

I can actually swap the series itself for pd.Series! That makes perfect sense! And looks so elegant. Thanks
1

Another, simpler method of doing this would be

df['percent_increase_cases'] = df[['new_cases']].apply(pd.Series.pct_change)

Notice the extra pair of [] when selecting columns.

Selecting a single column from a dataframe returns a series, which would run in to the problems described by @jjramsey, but selecting a list of columns keeps the dataframe as a dataframe, not running into trouble.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.