0

I have an excel worksheet, let us say its name is 'ws_actual'. The data looks as below.

Project Name    Date Paid   Actuals Item Amount Cumulative Sum
A   2016-04-10 00:00:00 124.2   124.2
A   2016-04-27 00:00:00 2727.5  2851.7
A   2016-05-11 00:00:00 2123.58 4975.28
A   2016-05-24 00:00:00 2500    7475.28
A   2016-07-07 00:00:00 38374.6 45849.88
A   2016-08-12 00:00:00 2988.14 48838.02
A   2016-09-02 00:00:00 23068   71906.02
A   2016-10-31 00:00:00 570.78  72476.8
A   2016-11-09 00:00:00 10885.75    83362.55
A   2016-12-08 00:00:00 28302.95    111665.5
A   2017-01-19 00:00:00 4354.3  116019.8
A   2017-02-28 00:00:00 3469.77 119489.57
A   2017-03-29 00:00:00 267.75  119757.32
B   2015-04-27 00:00:00 2969.93 2969.93
B   2015-06-02 00:00:00 118.8   3088.73
B   2015-06-18 00:00:00 2640    5728.73
B   2015-06-26 00:00:00 105.6   5834.33
B   2015-09-03 00:00:00 11879.7 17714.03
B   2015-10-22 00:00:00 5303.44 23017.47
B   2015-11-08 00:00:00 52000   75017.47
B   2015-11-25 00:00:00 2704.13 77721.6
B   2016-03-09 00:00:00 59752.85    137474.45
B   2016-03-13 00:00:00 512.73  137987.18
.
.
.

Let us say there are many many more projects including A and B with Date Paid and Amount information. I would like to create a plot by project where x axis is 'Date Paid' and y axis is 'Cumulative Sum', but when I just implement the following code, it just combines every project and plot every 'Cumulative Sum' at one graph. I wonder if I need to divide the table by project, save each, and then bring one by one to plot the graph. It is a lot of work, so I am wondering if there is a smarter way to do so. Please help me, genius.

import pandas as pd
import matplotlib.pyplot as plt

ws_actual = pd.read_excel(actual_file[0], sheet_name=0)
ax = ws_actual.plot(x='Date Paid', y='Cumulative Sum', color='g')
0

2 Answers 2

2

Right now you are connecting all of the points, regardless of group. A simple loop will work here allowing you to group the DataFrame and then plot each group as a separate curve. If you want you can define your own colorcycle if you have a lot of groups, so that colors do not repeat.

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(8,8))
for id, gp in ws_actual.groupby('Project Name'):
    gp.plot(x='Date Paid', y='Cumulative Sum', ax=ax, label=id)

plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Can I possibly separate the plot depending on a project?
Never mind!! I just deleted fig, ax = plt.subplots(figsize=(8,8)) and ax=ax! It works!
Yes, in that case you'd just need to do that. By first defining a figure and axes, and then specifying ax=ax it ensures all of the lines are plotted on the same axes object. Removing them instead generates one plot separate figure per group, and show shows them all at the end.
2

You could just iterate the projects:

for proj in ws_actual['Project'].unique():
    ws_actual[ws_actual['Project'] == proj].plot(x='Date Paid', y='Cumulative Sum', color='g')
    plt.show()

Or check out seaborn for an easy way to make a facet grid for which you can set a rows variable. Something along the lines of:

import seaborn as sns

g = sns.FacetGrid(ws_actual, row="Project")
g = g.map(plt.scatter, "Date Paid", "Cumulative Sum", edgecolor="w")

3 Comments

I am getting KeyError: 'Project' when I run this. What could cause this?
Is it possible that the Project column is actually in the index and not a column? Or that it's spelled differently (no caps maybe) than in your sample input above?
Oh! My bad. 'Project Name' is the column name. It works!! Thank you !!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.