How to add row with default values if not in consecutive order

Question

I have a df like this:

time  units   cost
0      4       10
1      2       10
3      4       20
4      1       20
5      3       10
6      1       20
9      2       10

As you can see, df.time is not consecutive. If there is a missing value, I want to add a new row, populating df.time with the consecutive time value, df.units with 2 and df.cost with 20. Expected output:

time  units   cost
0      4       10
1      2       10
2      2       20
3      4       20
4      1       20
5      3       10
6      1       20
7      2       20
8      2       20
9      2       10

How do I do this? I understand how to this by deconstructing all series into lists, looping through them and appending values when time is not equal to time - 1, but this seems inefficient.

Cameron Riddell · Accepted Answer · 2020-10-12 18:14:14Z

4

You can use the reindex method with a call to fillna to do this:

# Build new index that ranges from time min to time max with a step of 1
new_index = range(df["time"].min(), df["time"].max() + 1)


out = (df.set_index("time")                # Index our dataframe with the original time column
         .reindex(new_index)               # Reindex our dataframe with the new_index, all empty cells appear as nan
         .fillna({"units": 2, "cost": 20}) # Fill in the nans for units and cost with 2 and 20 respectively
         .astype(int))                     # Due to NaNs that were in column from reindexing, we'll manually recast our
                                           #   data type from float to int (not necessary, but produces cleaner output)

print(out)
      units  cost
time             
0         4    10
1         2    10
2         2    20
3         4    20
4         1    20
5         3    10
6         1    20
7         2    20
8         2    20
9         2    10

answered Oct 12, 2020 at 18:14

Cameron Riddell

13.8k14 silver badges21 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ch3steR Over a year ago

fillna take dict forgot about it. +1

alani · Accepted Answer · 2020-10-12 18:22:52Z

1

You can use df.reindex, then pd.Series.fillna.

idx = pd.RangeIndex(df['time'].min(), df['time'].max()+1) 
# If `df.time` is always sorted then,
# idx = pd.RangeIndex(df['time'].iat[0], df['time'].iat[-1]+1)

df = df.set_index('time')
df = df.reindex(idx)
df['units'] = df['units'].fillna(2).astype(int)
df['cost'] = df['cost'].fillna(20).astype(int)

# if you prefer not to hard-code the names of the columns, replace last
# the two lines with:
#   defaults = [2,20]
#   for (name, default) in zip(df.columns, defaults):
#       df[name] = df[name].fillna(default).astype(type(default))

      units  cost
time             
0         4    10
1         2    10
2         2    20
3         4    20
4         1    20
5         3    10
6         1    20
7         2    20
8         2    20
9         2    10

edited Oct 12, 2020 at 18:22

alani

13.2k3 gold badges18 silver badges34 bronze badges

answered Oct 12, 2020 at 18:11

Ch3steR

20.8k4 gold badges34 silver badges66 bronze badges

1 Comment

alani Over a year ago

Going to edit this with a suggestion as a comment - feel free to edit further to either incorporate this into the actual code or to undo my edit, as you see fit...

Andrej Kesely · Accepted Answer · 2020-10-12 18:00:15Z

0

You can construct new DataFrame with complete "time" column and then do .fillna() from original dataframe (df is your original dataframe):

r = range(df['time'].min(), df['time'].max()+1)
df_out = pd.DataFrame({'time': r, 'units': [np.nan]*len(r), 'cost': [np.nan]*len(r)}).set_index('time')

df_out = df_out.fillna(df.set_index('time'))
df_out['units'] = df_out['units'].fillna(2).astype(int)
df_out['cost'] = df_out['cost'].fillna(20).astype(int)

print(df_out)

Prints:

      units  cost
time             
0         4    10
1         2    10
2         2    20
3         4    20
4         1    20
5         3    10
6         1    20
7         2    20
8         2    20
9         2    10

answered Oct 12, 2020 at 18:00

Andrej Kesely

196k15 gold badges60 silver badges105 bronze badges

Collectives™ on Stack Overflow

How to add row with default values if not in consecutive order

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related