3

I have a data frame which looks as given below.First, I wanted the count of each status in each date. For example number of 'COMPLETED' in 2017-11-02 is 2.I want a stack plot of such.

                   status              start_time                end_time  \
0             COMPLETED 2017-11-01 19:58:54.726 2017-11-01 20:01:05.414   
1             COMPLETED 2017-11-02 19:43:04.000 2017-11-02 19:47:54.877   
2     ABANDONED_BY_USER 2017-11-03 23:36:19.059 2017-11-03 23:36:41.045   
3  ABANDONED_BY_TIMEOUT 2017-10-31 17:02:38.689 2017-10-31 17:12:38.844   
4             COMPLETED 2017-11-02 19:35:33.192 2017-11-02 19:42:51.074   

Here is the csv for the dataframe:

status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074

To achieve this:

df_['status'].astype('category')
df_ = df_.set_index('start_time')
grouped = df_.groupby('status')
color = {'COMPLETED':'green','ABANDONED_BY_TIMEOUT':'blue',"MISSED":'red',"ABANDONED_BY_USER":'yellow'}

for key_, group in grouped:
   print(key_)
   df_ = group.groupby(lambda x: x.date).count()
   print(df_)
   df_['status'].plot(label=key_,kind='bar',stacked=True,\
   color=color[key_],rot=90)
plt.show()

The output of the following is :

ABANDONED_BY_TIMEOUT
            status  end_time  
2017-10-31       1         1       
ABANDONED_BY_USER
            status  end_time  
2017-11-03       1         1            
COMPLETED
            status  end_time  
2017-11-01       1         1             
2017-11-02       2         2 

plot from above code

The problem here as we can see it is taking into account only last two dates '2017-11-01' and '2017-11-02' instead of all the dates in all the categories. How can I solve this problem?I am welcome to a whole new approach for stacked plot.Thanks in advance.

2
  • first post your full dataframe as csv in your question Commented Mar 8, 2019 at 5:43
  • there you go, edited with csv Commented Mar 8, 2019 at 5:49

3 Answers 3

2
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')


grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count")).pivot(columns='status', index='date', values='count')
print(grouped)
sns.set()

grouped.plot(kind='bar', stacked=True)

# g = grouped.plot(x='date', kind='bar', stacked=True)
plt.show()

output:

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

2

Try restructuring df_ with pandas.crosstab instead:

color = ['blue', 'yellow', 'green', 'red']
df_xtab = pd.crosstab(df_.start_time.dt.date, df_.status)

This DataFrame will look like:

status      ABANDONED_BY_TIMEOUT  ABANDONED_BY_USER  COMPLETED
start_time                                                    
2017-10-31                     1                  0          0
2017-11-01                     0                  0          1
2017-11-02                     1                  0          2
2017-11-03                     0                  1          0

and will be easier to plot.

df_xtab.plot(kind='bar',stacked=True, color=color, rot=90)

enter image description here

Comments

1

use seaborn library barplot with its hue

code:

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df_ = pd.read_csv('sam.csv')
df_['date'] = pd.to_datetime(df_['start_time']).dt.date
df_ = df_.set_index('start_time')

print(df_)

grouped = pd.DataFrame(df_.groupby(['date', 'status']).size().reset_index(name="count"))
print(grouped)

g = sns.barplot(x='date', y='count', hue='status', data=grouped)
plt.show()

output: enter image description here


data:

status,start_time,end_time
COMPLETED,2017-11-01 19:58:54.726,2017-11-01 20:01:05.414
COMPLETED,2017-11-02 19:43:04.000,2017-11-02 19:47:54.877
ABANDONED_BY_USER,2017-11-03 23:36:19.059,2017-11-03 23:36:41.045
ABANDONED_BY_TIMEOUT,2017-10-31 17:02:38.689,2017-10-31 17:12:38.844
COMPLETED,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074
ABANDONED_BY_TIMEOUT,2017-11-02 19:35:33.192,2017-11-02 19:42:51.074

enter image description here

4 Comments

thanks for the answer, I chaged the data a little bit so that it is easy to visualize stacked bar plot. I don't think your answer helps with that. Any solutions ?
it still does its job
whats the problem?, i don't get it
for date 2017-11-02. I want a stacked bar plot instead of set of vertical bars

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.