Add a row in pandas dataframe for every date in another dateframe column

Question

I have a dataframe that contains an entry for a symbol occasionally and then a count. I would like to expand the dataframe so that every symbol contains a row for the entire daterange in the dataframe. I want to enter a value of '0' for the count where there is no entry for a symbol on a certain date.

My dataframe:

dates = ['2021-01-01','2021-01-02','2021-01-03']
symbol = ['a','b','a']
count = [1,2,3]
df = pd.DataFrame({'Mention Datetime': dates,
                'Symbol': symbol,
                'Count':count})


    Mention Datetime    Symbol  Count
0   2021-01-01  a   1
1   2021-01-02  b   2
2   2021-01-03  a   3

what I want it to look like:

Mention Datetime    Symbol  Count
0   2021-01-01  a   1
1   2021-01-02  a   0
2   2021-01-03  a   3
3   2021-01-01  b   0
4   2021-01-02  b   2
5   2021-01-03  b   0

Quang Hoang · Accepted Answer · 2021-02-01 20:48:11Z

2

Use pivot_table then stack:

df = df.pivot_table(index='Mention Datetime',
                    columns='Symbol', fill_value=0
                    ).stack().reset_index()

Output:

  Mention Datetime Symbol  Count
0       2021-01-01      a      1
1       2021-01-01      b      0
2       2021-01-02      a      0
3       2021-01-02      b      2
4       2021-01-03      a      3
5       2021-01-03      b      0

answered Feb 1, 2021 at 20:48

Quang Hoang

151k11 gold badges64 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kyle · Accepted Answer · 2021-02-01 20:53:36Z

1

You can reindex with a new multi index created from the unique values of the columns in question.

import pandas as pd
from io import StringIO

s = '''
Mention Datetime    Symbol  Count
2021-01-01          a       1
2021-01-02          b       2
2021-01-03          a       3
'''

df = pd.read_fwf(StringIO(s), header=1)
df = df.set_index(['Mention Datetime', 'Symbol'])
df
                            Count
Mention Datetime    Symbol  
2021-01-01          a       1
2021-01-02          b       2
2021-01-03          a       3

df = df.reindex(
    pd.MultiIndex.from_product(
        [
        df.index.get_level_values('Mention Datetime').unique(), 
        df.index.get_level_values('Symbol').unique()
        ]
    ) 
).fillna(0)

df
                            Count
Mention Datetime    Symbol  
2021-01-01          a       1.0
                    b       0.0
2021-01-02          a       0.0
                    b       2.0
2021-01-03          a       3.0
                    b       0.0

answered Feb 1, 2021 at 20:53

Kyle

2,9342 gold badges21 silver badges30 bronze badges

1 Comment

M A Over a year ago

Thank you. I think the solution above will get me what I need, but this is probably more robust if I need it in future. I'm going to keep this in mind if I set my index as the Symbol with multiple dates each.

Collectives™ on Stack Overflow

Add a row in pandas dataframe for every date in another dateframe column

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related