1

The data I'm using is a conversation message log. I have a Pandas Dataframe with datestamps as the index, and two columns; one for "sender" and one for "message."

I'm simply trying to plot a stackplot of messages over time. I don't actually need the contents of message, so I've cleaned the data as follows:

Dummydata:

df = pd.Dataframe({'date': [Timestamp('2019-07-29 19:58:00'), Timestamp('2019-07-29 20:03:00'), Timestamp('2019-08-01 19:22:00'), Timestamp('2019-08-01 19:23:00'), Timestamp('2019-08-01 19:25:00'), Timestamp('2019-08-04 11:28:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 11:29:00'), Timestamp('2019-08-04 12:43:00'), Timestamp('2019-08-04 12:49:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-04 12:51:00'), Timestamp('2019-08-25 22:33:00'), Timestamp('2019-08-27 11:55:00'), Timestamp('2019-08-27 18:35:00'), Timestamp('2019-11-06 18:53:00'), Timestamp('2019-11-06 18:54:00'), Timestamp('2019-11-06 20:42:00'), Timestamp('2019-11-07 00:16:00'), Timestamp('2019-11-07 15:24:00'), Timestamp('2019-11-07 16:06:00'), Timestamp('2019-11-08 11:48:00'), Timestamp('2019-11-08 11:53:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:55:00'), Timestamp('2019-11-08 11:59:00'), Timestamp('2019-11-08 12:03:00'), Timestamp('2019-12-24 13:40:00'), Timestamp('2019-12-24 13:42:00'), Timestamp('2019-12-24 13:43:00'), Timestamp('2019-12-24 13:44:00'), Timestamp('2019-12-24 13:44:00')], 'sender': ['Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2'], 'message': ['Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 'Then we are both glad', 'Indeed we are.', 'I sure hope this is enough fake conversation for stackoverflow.', 'Better write a few more messages just in case', "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted']})
dfgrouped = df.groupby(["sender"])
dfgrouped[["sender"]].resample("D").count()

This gives a dataframe grouped by each sender in the conversation, with DateTime as index and number of messages sent for that given day.

dfgrouped[["sender"]].get_group("Joe Bloggs").resample("D").count()

... would give a dataframe with just one user and their message counts per day.

I'd like to know how to use matplotlib to plot a stackplot where each "sender" is a different line. I haven't been able to achieve this through either

ax.stackplot(dfgrouped[["sender"]].resample("D").count())

or through looping:

for sender in df["sender"].unique():
     axs[i].stackplot(dfgrouped[["sender"]].get_group(sender).resample("D").count()
2
  • 1
    it would help if you would provide mockup data, in particular check out How to make good reproducible pandas examples Commented Mar 24, 2020 at 20:58
  • Thanks, I've added some dummy data. Commented Mar 25, 2020 at 14:17

1 Answer 1

3

You can use pandas' own stackplot function, df.plot.area(). This is a wrapper for the Matplotlib function, working as a method on DataFrames. You just have to get your data in the right shape. With your groupby and count operations you're almost there:

import pandas as pd

df = pd.DataFrame({'sender': [
    'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 1', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 2', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 1', 'Person 2', 
    'Person 2', 'Person 1', 'Person 2', 'Person 2', 'Person 1', 'Person 2', 'Person 2', 
    'Person 1', 'Person 2', 'Person 1', 'Person 2'], 
    'message': [
    'Hello', 'Hi there', "How's things", 'good', 'I am glad', 'Me too.', 
    'Then we are both glad', 'Indeed we are.', 
    'I sure hope this is enough fake conversation for stackoverflow.', 
    'Better write a few more messages just in case', 
    "But the message content isn't relevant", 'Oh yeah.', "I'm going to stop now.", 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 'redacted', 
    'redacted', 'redacted', 'redacted', 'redacted', 'redacted']}, 
    index = pd.DatetimeIndex([
    pd.Timestamp('2019-07-29 19:58:00'), pd.Timestamp('2019-07-29 20:03:00'), 
    pd.Timestamp('2019-08-01 19:22:00'), pd.Timestamp('2019-08-01 19:23:00'),
    pd.Timestamp('2019-08-01 19:25:00'), pd.Timestamp('2019-08-04 11:28:00'), 
    pd.Timestamp('2019-08-04 11:29:00'), pd.Timestamp('2019-08-04 11:29:00'), 
    pd.Timestamp('2019-08-04 12:43:00'), pd.Timestamp('2019-08-04 12:49:00'), 
    pd.Timestamp('2019-08-04 12:51:00'), pd.Timestamp('2019-08-04 12:51:00'), 
    pd.Timestamp('2019-08-25 22:33:00'), pd.Timestamp('2019-08-27 11:55:00'), 
    pd.Timestamp('2019-08-27 18:35:00'), pd.Timestamp('2019-11-06 18:53:00'), 
    pd.Timestamp('2019-11-06 18:54:00'), pd.Timestamp('2019-11-06 20:42:00'), 
    pd.Timestamp('2019-11-07 00:16:00'), pd.Timestamp('2019-11-07 15:24:00'), 
    pd.Timestamp('2019-11-07 16:06:00'), pd.Timestamp('2019-11-08 11:48:00'), 
    pd.Timestamp('2019-11-08 11:53:00'), pd.Timestamp('2019-11-08 11:55:00'), 
    pd.Timestamp('2019-11-08 11:55:00'), pd.Timestamp('2019-11-08 11:59:00'), 
    pd.Timestamp('2019-11-08 12:03:00'), pd.Timestamp('2019-12-24 13:40:00'), 
    pd.Timestamp('2019-12-24 13:42:00'), pd.Timestamp('2019-12-24 13:43:00'), 
    pd.Timestamp('2019-12-24 13:44:00'), pd.Timestamp('2019-12-24 13:44:00')]))

df_group = df.groupby(["sender"])
df_count = df_group[["sender"]].resample("D").count()

df_plot = pd.concat([df_count.loc['Person 1', :], 
                     df_count.loc['Person 2', :]], 
                    axis=1)
df_plot.columns = ['Sender 1', 'Sender 2']

df_plot.plot.area()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.