I have a data frame which looks as given below.First, I wanted the count of each status in each date. For example number of 'COMPLETED' in 2017-11-02 is 2.I want a stack plot of such.
status start_time end_time \
0 COMPLETED 2017-11-01 19:58:54.726 2017-11-01 20:01:05.414
1 COMPLETED 2017-11-02 19:43:04.000 2017-11-02 19:47:54.877
2 ABANDONED_BY_USER 2017-11-03 23:36:19.059 2017-11-03 23:36:41.045
3 ABANDONED_BY_TIMEOUT 2017-10-31 17:02:38.689 2017-10-31 17:12:38.844
4 COMPLETED 2017-11-02 19:35:33.192 2017-11-02 19:42:51.074
Here is the csv for the dataframe:
status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
To achieve this:
df_['status'].astype('category')
df_ = df_.set_index('start_time')
grouped = df_.groupby('status')
color = {'COMPLETED':'green','ABANDONED_BY_TIMEOUT':'blue',"MISSED":'red',"ABANDONED_BY_USER":'yellow'}
for key_, group in grouped:
print(key_)
df_ = group.groupby(lambda x: x.date).count()
print(df_)
df_['status'].plot(label=key_,kind='bar',stacked=True,\
color=color[key_],rot=90)
plt.show()
The output of the following is :
ABANDONED_BY_TIMEOUT
status end_time
2017-10-31 1 1
ABANDONED_BY_USER
status end_time
2017-11-03 1 1
COMPLETED
status end_time
2017-11-01 1 1
2017-11-02 2 2
The problem here as we can see it is taking into account only last two dates '2017-11-01' and '2017-11-02' instead of all the dates in all the categories. How can I solve this problem?I am welcome to a whole new approach for stacked plot.Thanks in advance.




