Add missing datetime columns to grouped dataframe

Question

Is it possible to add missing date columns from created date_range to grouped dataframe df without for loop and fill zeros as missing values? date_range has 7 date elements. df has 4 date columns. So how to add 3 missing columns to df?

import pandas as pd
from datetime import datetime

start = datetime(2018,6,4, )
end = datetime(2018,6,10,)
date_range = pd.date_range(start=start, end=end, freq='D')

DatetimeIndex(['2018-06-04', '2018-06-05', '2018-06-06', '2018-06-07',
               '2018-06-08', '2018-06-09', '2018-06-10'],
              dtype='datetime64[ns]', freq='D')

df = pd.DataFrame({
'date': 
    ['2018-06-07', '2018-06-10', '2018-06-09','2018-06-09',
    '2018-06-08','2018-06-09','2018-06-08','2018-06-10',
    '2018-06-10','2018-06-10',],
'name':
    ['sogan', 'lyam','alex','alex',
    'kovar','kovar','kovar','yamo','yamo','yamo',]
})
df['date'] = pd.to_datetime(df['date'])

df = (df
      .groupby(['name', 'date',])['date',]
      .count()
      .unstack(fill_value=0)
)

df

    date    date    date    date
date    2018-06-07 00:00:00 2018-06-08 00:00:00 2018-06-09 00:00:00 2018-06-10 00:00:00
name                
alex    0   0   2   0
kovar   0   2   1   0
lyam    0   0   0   1
sogan   1   0   0   0
yamo    0   0   0   3

ilearn · Accepted Answer · 2019-02-04 19:43:12Z

1

I would pivot the table for making the date columns as rows then use the .asfreq function of pandas as below:

DataFrame.asfreq(freq, method=None, how=None, normalize=False, fill_value=None)

source: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.asfreq.html

answered Feb 4, 2019 at 19:43

ilearn

1933 silver badges16 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Valery Ramusik Over a year ago

Thanks for clue about date columns as rows.

Valery Ramusik · Accepted Answer · 2019-02-13 20:39:47Z

Thanks Sina Shabani for clue to making date columns as rows. And in this situation more suitable setting date as index and using .reindex appeared

df = (df.groupby(['date', 'name'])['name']
        .size()
        .reset_index(name='count')
        .pivot(index='date', columns='name', values='count')
        .fillna(0))

df

name    alex    kovar   lyam    sogan   yamo
date                    
2018-06-07  0.0 0.0 0.0 1.0 0.0
2018-06-08  0.0 2.0 0.0 0.0 0.0
2018-06-09  2.0 1.0 0.0 0.0 0.0
2018-06-10  0.0 0.0 1.0 0.0 3.0

df.index = pd.DatetimeIndex(df.index)

df = (df.reindex(pd.date_range(start, freq='D', periods=7), fill_value=0)
        .sort_index())
df

name    alex    kovar   lyam    sogan   yamo
2018-06-04  0.0 0.0 0.0 0.0 0.0
2018-06-05  0.0 0.0 0.0 0.0 0.0
2018-06-06  0.0 0.0 0.0 0.0 0.0
2018-06-07  0.0 0.0 0.0 1.0 0.0
2018-06-08  0.0 2.0 0.0 0.0 0.0
2018-06-09  2.0 1.0 0.0 0.0 0.0
2018-06-10  0.0 0.0 1.0 0.0 3.0

df.T
    date    2018-06-07 00:00:00 2018-06-08 00:00:00 2018-06-09 00:00:00 2018-06-10 00:00:00
name                
alex    0.0 0.0 2.0 0.0
kovar   0.0 2.0 1.0 0.0
lyam    0.0 0.0 0.0 1.0
sogan   1.0 0.0 0.0 0.0
yamo    0.0 0.0 0.0 3.0

Collectives™ on Stack Overflow

Add missing datetime columns to grouped dataframe

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related