Depending on the task you are trying to solve, i can see two options for this dataset.
- Either, as you show in your example, count the number of occurrences of the text field in each day, independently of the value of the text field.
- Or, count the number of occurrence of each unique value of the text field each day. You will then have one column for each possible value of the text field, which may make more sense if the values are purely categorical.
First things to do :
import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done
Date Text
0 2018-01-01 A
1 2018-01-01 B
2 2018-01-01 C
3 2018-01-02 A
4 2018-01-03 A
Then for option one :
df = df.groupby('Date').count()
Text
Date
2018-01-01 3
2018-01-02 1
2018-01-03 1
For option two :
df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()
A B C
Date
2018-01-01 1 1 1
2018-01-02 1 0 0
2018-01-03 1 0 0
The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.
If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.