Stackplot with matplotlib and a grouped Pandas dataframe

Question

The data I'm using is a conversation message log. I have a Pandas Dataframe with datestamps as the index, and two columns; one for "sender" and one for "message."

I'm simply trying to plot a stackplot of messages over time. I don't actually need the contents of message, so I've cleaned the data as follows:

Dummydata:

df = pd.Dataframe({'date': [Timestamp('2019-07-29 19:58:00'), Timestamp('2019-07-29 20:03:00'), Timestamp('2019-08-01 19:22:00'), Timestamp('2019-08-01 19:23:00'), Timestamp('2019-08-01 19:25:00'), Timestamp('2019-08-04 11:28:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 12:43:00'), Timestamp('2019-08-04 12:49:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-25 22:33:00'), Timestamp('2019-08-27 11:55:00'), Timestamp('2019-08-27 18:35:00'), Timestamp('2019-11-06 18:53:00'), Timestamp('2019-11-06 18:54:00'), Timestamp('2019-11-06 20:42:00'), Timestamp('2019-11-07 00:16:00'), Timestamp('2019-11-07 15:24:00'), Timestamp('2019-11-07 16:06:00'), Timestamp('2019-11-08 11:48:00'), Timestamp('2019-11-08 11:53:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:59:00'), Timestamp('2019-11-08 12:03:00'), Timestamp('2019-12-24 13:40:00'), Timestamp('2019-12-24 13:42:00'), Timestamp('2019-12-24 13:43:00'), Timestamp('2019-12-24 13:44:00'), Timestamp('2019-12-24 13:44:00')], 'sender': ['Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2'], 'message': ['Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 'Then we are both glad', 'Indeed we are.', 'I sure hope this is enough fake conversation for stackoverflow.', 'Better write a few more messages just in case', "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted']})

dfgrouped = df.groupby(["sender"])
dfgrouped[["sender"]].resample("D").count()

This gives a dataframe grouped by each sender in the conversation, with DateTime as index and number of messages sent for that given day.

dfgrouped[["sender"]].get_group("Joe Bloggs").resample("D").count()

... would give a dataframe with just one user and their message counts per day.

I'd like to know how to use matplotlib to plot a stackplot where each "sender" is a different line. I haven't been able to achieve this through either

ax.stackplot(dfgrouped[["sender"]].resample("D").count())

or through looping:

for sender in df["sender"].unique():
     axs[i].stackplot(dfgrouped[["sender"]].get_group(sender).resample("D").count()

it would help if you would provide mockup data, in particular check out How to make good reproducible pandas examples — Diziet Asahi
– Diziet Asahi, Commented Mar 24, 2020 at 20:58

Arne · Accepted Answer · 2020-04-14 01:21:15Z

You can use pandas' own stackplot function, df.plot.area(). This is a wrapper for the Matplotlib function, working as a method on DataFrames. You just have to get your data in the right shape. With your groupby and count operations you're almost there:

import pandas as pd

df = pd.DataFrame({'sender': [
    'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 
    'Person 1', 'Person 2', 'Person 1', 'Person 2'], 
    'message': [
    'Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 
    'Then we are both glad', 'Indeed we are.', 
    'I sure hope this is enough fake conversation for stackoverflow.', 
    'Better write a few more messages just in case', 
    "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted']}, 
    index = pd.DatetimeIndex([
    pd.Timestamp('2019-07-29 19:58:00'), pd.Timestamp('2019-07-29 20:03:00'), 
    pd.Timestamp('2019-08-01 19:22:00'), pd.Timestamp('2019-08-01 19:23:00'),
    pd.Timestamp('2019-08-01 19:25:00'), pd.Timestamp('2019-08-04 11:28:00'), 
    pd.Timestamp('2019-08-04 11:29:00'), pd.Timestamp('2019-08-04 11:29:00'), 
    pd.Timestamp('2019-08-04 12:43:00'), pd.Timestamp('2019-08-04 12:49:00'), 
    pd.Timestamp('2019-08-04 12:51:00'), pd.Timestamp('2019-08-04 12:51:00'), 
    pd.Timestamp('2019-08-25 22:33:00'), pd.Timestamp('2019-08-27 11:55:00'), 
    pd.Timestamp('2019-08-27 18:35:00'), pd.Timestamp('2019-11-06 18:53:00'), 
    pd.Timestamp('2019-11-06 18:54:00'), pd.Timestamp('2019-11-06 20:42:00'), 
    pd.Timestamp('2019-11-07 00:16:00'), pd.Timestamp('2019-11-07 15:24:00'), 
    pd.Timestamp('2019-11-07 16:06:00'), pd.Timestamp('2019-11-08 11:48:00'), 
    pd.Timestamp('2019-11-08 11:53:00'), pd.Timestamp('2019-11-08 11:55:00'), 
    pd.Timestamp('2019-11-08 11:55:00'), pd.Timestamp('2019-11-08 11:59:00'), 
    pd.Timestamp('2019-11-08 12:03:00'), pd.Timestamp('2019-12-24 13:40:00'), 
    pd.Timestamp('2019-12-24 13:42:00'), pd.Timestamp('2019-12-24 13:43:00'), 
    pd.Timestamp('2019-12-24 13:44:00'), pd.Timestamp('2019-12-24 13:44:00')]))

df_group = df.groupby(["sender"])
df_count = df_group[["sender"]].resample("D").count()

df_plot = pd.concat([df_count.loc['Person 1', :], 
                     df_count.loc['Person 2', :]], 
                    axis=1)
df_plot.columns = ['Sender 1', 'Sender 2']

df_plot.plot.area()

Collectives™ on Stack Overflow

Stackplot with matplotlib and a grouped Pandas dataframe

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related