Plotting two dataframes obtained from a loop in the same graph Python

Question

I would like to plot two dfs with two different colors. For each df, I would need to add two markers. Here is what I have tried:

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
    plt.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    plt.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

Using this piece of code, I get the servers_df plotted with markers, but on separate graphs. How I can have both graphs in a single one to compare them better?

Thanks.

It seems your question relates to matplotlib and pandas. If this is not the case, please remove the added tags and indicate the libraries you intend to use. Please provide also a complete example including a toy dataset and the expected output. — Mr. T
– Mr. T, Commented Nov 18, 2020 at 13:23
As for the question - it seems you should create fig, ax = plt.subplots() and then use data.servers_df.plot(..., ax=ax) and ax.plot(...) in your loop. — Mr. T
– Mr. T, Commented Nov 18, 2020 at 13:29

BStadlbauer · Accepted Answer · 2020-11-20 17:23:21Z

TL;DR

Your call to data.servers_df.plot() always creates a new plot, and plt.plot() plots on the latest plot that was created. The solution is to create dedicated axis for everything to plot onto.

Preface

I assumed your variables are the following

data.servers_df: Dataframe with two float columns "time" and "percentage"
data.first_measurements: A dictionary with keys "time" and `"percentage", which each are a list of floats
data.second_measurements: A dictionary with keys "time" and "percentage", which each are a list of floats

I skipped generating stat_files as you did not show what Graph() does, but just created a list of dummy data.

If data.first_measurements and data.second_measurements are also dataframes, let me know and there is an even nicer solution.

Theory - Behind the curtains

Each matplotlib plot (line, bar, etc.) lives on a matplotlib.axes.Axes element. These are like regular axes of a coordinate system. Now two things happen here:

When you use plt.plot(), there are no axes specified and thus, matplotlib looks up the current axes element (in the background), and if there is none, it will create an empty one and use it, and set is as default. The second call to plt.plot() then finds these axes and uses them.
DataFrame.plot() on the other hand, always creates a new axes element if none is given to it (possible through the ax argument)

So in your code, data.servers_df.plot() first creates an axes element behind the curtains (which is then the default), and the two following plt.plot() calls get the default axes and plot onto it - which is why you get two plots instead of one.

Solution

The following solution first creates a dedicated matplotlib.axes.Axes using plt.subplots(). This axis element is then used to draw all lines onto. Note especially the ax=ax in data.server_df.plot(). Note that I changed the display of your markers from o- to o (as we don't want to display a line (-) but only markers (o)). Mock data can be found below

fig, ax = plt.subplots()  # Here we create the axes that all data will plot onto
for i, data in enumerate(stat_files):
    y_column = f'percentage_{i}'  # Make the columns identifiable
    data.servers_df \
        .rename(columns={'percentage': y_column}) \
        .plot(x='time', y=y_column, linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o', color='green')
plt.show()

Mock data

import random

import pandas as pd
import matplotlib.pyplot as plt

# Generation of dummy data
random.seed(1)
NUMBER_OF_DATA_FILES = 2
X_LENGTH = 10


class Data:
    def __init__(self):
        self.servers_df = pd.DataFrame(
            {
                'time': range(X_LENGTH),
                'percentage': [random.randint(0, 10) for _ in range(X_LENGTH)]
            }
        )
        self.first_measurement = {
            'time': self.servers_df['time'].values[:X_LENGTH // 2],
            'percentage': self.servers_df['percentage'].values[:X_LENGTH // 2]
        }
        self.second_measurement = {
            'time': self.servers_df['time'].values[X_LENGTH // 2:],
            'percentage': self.servers_df['percentage'].values[X_LENGTH // 2:]
        }


stat_files = [Data() for _ in range(NUMBER_OF_DATA_FILES)]

Djib2011 · Accepted Answer · 2020-11-18 20:37:49Z

3

DataFrame.plot() by default returns a matplotlib.axes.Axes object. You should then plot the other two plots on this object:

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    ax = data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

If you want to plot them one on top of the others with different colors you can do something like this:

colors = ['C0', 'C1', 'C2']  # matplotlib default color palette
                             # assuming that len(stats_files) = 3
                             # if not you need to specify as many colors as necessary 

ax = plt.subplot(111)
for stats_file, c in zip(stats_files, colors):
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color=c)
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

This just changes the color of the servers_df.plot. If you want to change the color of the other two you can just to the same logic: create a list of colors that you want them to take at each iteration, iterate over that list and pass the color value to the color param at each iteration.

edited Nov 18, 2020 at 20:37

answered Nov 18, 2020 at 13:37

Djib2011

7,4625 gold badges43 silver badges45 bronze badges

3 Comments

Albert Over a year ago

Thanks for your reply. If I would use this part of code, I would get 3 graphs: 2 empty(which I assume that would be the subplots) and one plot which will belong to the second file. Any idea why that would happen?

Albert Over a year ago

If I would add ax=ax in ax = data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax), I would get the two graphs, but one under another. I would like to overlap them using different colors.

Djib2011 Over a year ago

I changed it a bit, see if it helps now :)

meTchaikovsky · Accepted Answer · 2020-11-19 00:38:43Z

1

You can create an Axes object for plotting in the first place, for example

import pandas as pd
import numpy as np 
from matplotlib import pyplot as plt 


df_one = pd.DataFrame({'a':np.linspace(1,10,10),'b':np.linspace(1,10,10)})
df_two = pd.DataFrame({'a':np.random.randint(0,20,10),'b':np.random.randint(0,5,10)})

dfs = [df_one,df_two]
fig,ax = plt.subplots(figsize=(8,6))

colors = ['navy','darkviolet']
markers = ['x','o']
for ind,item in enumerate(dfs):
    ax.plot(item['a'],item['b'],c=colors[ind],marker=markers[ind])

as you can see, in the same ax, the two dataframes are plotted with different colors and markers.

answered Nov 19, 2020 at 0:38

meTchaikovsky

7,6963 gold badges18 silver badges37 bronze badges

Comments

François B. · Accepted Answer · 2020-11-25 08:51:33Z

1

You need to create the plot before. Afterwards, you can explicitly refer to this plot while plotting the graphs. df.plot(..., ax=ax) or ax.plot(x, y)

import matplotlib.pyplot as plt

(fig, ax) = plt.subplots(figsize=(20,5))

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

answered Nov 25, 2020 at 8:51

François B.

1,18610 silver badges21 bronze badges

Collectives™ on Stack Overflow

Plotting two dataframes obtained from a loop in the same graph Python

4 Answers 4

TL;DR

Preface

Theory - Behind the curtains

Solution

Mock data

Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

TL;DR

Preface

Theory - Behind the curtains

Solution

Mock data

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related