0

I have a dataset of around 13000 rows and 2 columns (text and date) for two year period. One of the column is date in yyyy-mm-dd format. I want to perform time series analysis where x axis would be date (each day) and y axis would be frequency of text on corresponding date.

I think if I create a new data frame with unique dates and number of text on corresponding date that would solve my problem.

Sample data

enter image description here

How can I create a new column with frequency of text each day? For example:

enter image description here

Thanks in Advance!

1 Answer 1

1

Depending on the task you are trying to solve, i can see two options for this dataset.

  • Either, as you show in your example, count the number of occurrences of the text field in each day, independently of the value of the text field.
  • Or, count the number of occurrence of each unique value of the text field each day. You will then have one column for each possible value of the text field, which may make more sense if the values are purely categorical.

First things to do :

import pandas as pd
df = pd.DataFrame(data={'Date':['2018-01-01','2018-01-01','2018-01-01', '2018-01-02', '2018-01-03'], 'Text':['A','B','C','A','A']})
df['Date'] = pd.to_datetime(df['Date']) #convert to datetime type if not already done

         Date Text
0  2018-01-01    A
1  2018-01-01    B
2  2018-01-01    C
3  2018-01-02    A
4  2018-01-03    A

Then for option one :

df = df.groupby('Date').count()

            Text
Date            
2018-01-01     3
2018-01-02     1
2018-01-03     1

For option two :

df[df['Text'].unique()] = pd.get_dummies(df['Text'])
df = df.drop('Text', axis=1)
df = df.groupby('Date').sum()

            A  B  C
Date               
2018-01-01  1  1  1
2018-01-02  1  0  0
2018-01-03  1  0  0

The get_dummies function will create one column per possible value of the Text field. Each column is then a boolean indicator for each row of the dataframe, telling us which value of the Text field occurred in this row. We can then simply make a sum aggregation with a groupby by the Date field.

If you are not familiar with the use of groupby and aggregation operation, i recommend that you read this guide first.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.