0

I have a dataframe that contains an entry for a symbol occasionally and then a count. I would like to expand the dataframe so that every symbol contains a row for the entire daterange in the dataframe. I want to enter a value of '0' for the count where there is no entry for a symbol on a certain date.

My dataframe:

dates = ['2021-01-01','2021-01-02','2021-01-03']
symbol = ['a','b','a']
count = [1,2,3]
df = pd.DataFrame({'Mention Datetime': dates,
                'Symbol': symbol,
                'Count':count})


    Mention Datetime    Symbol  Count
0   2021-01-01  a   1
1   2021-01-02  b   2
2   2021-01-03  a   3

what I want it to look like:

Mention Datetime    Symbol  Count
0   2021-01-01  a   1
1   2021-01-02  a   0
2   2021-01-03  a   3
3   2021-01-01  b   0
4   2021-01-02  b   2
5   2021-01-03  b   0

2 Answers 2

2

Use pivot_table then stack:

df = df.pivot_table(index='Mention Datetime',
                    columns='Symbol', fill_value=0
                    ).stack().reset_index()

Output:

  Mention Datetime Symbol  Count
0       2021-01-01      a      1
1       2021-01-01      b      0
2       2021-01-02      a      0
3       2021-01-02      b      2
4       2021-01-03      a      3
5       2021-01-03      b      0
Sign up to request clarification or add additional context in comments.

Comments

1

You can reindex with a new multi index created from the unique values of the columns in question.

import pandas as pd
from io import StringIO

s = '''
Mention Datetime    Symbol  Count
2021-01-01          a       1
2021-01-02          b       2
2021-01-03          a       3
'''

df = pd.read_fwf(StringIO(s), header=1)
df = df.set_index(['Mention Datetime', 'Symbol'])
df
                            Count
Mention Datetime    Symbol  
2021-01-01          a       1
2021-01-02          b       2
2021-01-03          a       3

df = df.reindex(
    pd.MultiIndex.from_product(
        [
        df.index.get_level_values('Mention Datetime').unique(), 
        df.index.get_level_values('Symbol').unique()
        ]
    ) 
).fillna(0)

df
                            Count
Mention Datetime    Symbol  
2021-01-01          a       1.0
                    b       0.0
2021-01-02          a       0.0
                    b       2.0
2021-01-03          a       3.0
                    b       0.0

1 Comment

Thank you. I think the solution above will get me what I need, but this is probably more robust if I need it in future. I'm going to keep this in mind if I set my index as the Symbol with multiple dates each.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.