5

I would like to plot two dfs with two different colors. For each df, I would need to add two markers. Here is what I have tried:

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
    plt.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    plt.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

Using this piece of code, I get the servers_df plotted with markers, but on separate graphs. How I can have both graphs in a single one to compare them better?

Thanks.

2
  • 1
    It seems your question relates to matplotlib and pandas. If this is not the case, please remove the added tags and indicate the libraries you intend to use. Please provide also a complete example including a toy dataset and the expected output. Commented Nov 18, 2020 at 13:23
  • 1
    As for the question - it seems you should create fig, ax = plt.subplots() and then use data.servers_df.plot(..., ax=ax) and ax.plot(...) in your loop. Commented Nov 18, 2020 at 13:29

4 Answers 4

5
+25

TL;DR

Your call to data.servers_df.plot() always creates a new plot, and plt.plot() plots on the latest plot that was created. The solution is to create dedicated axis for everything to plot onto.

Preface

I assumed your variables are the following

  • data.servers_df: Dataframe with two float columns "time" and "percentage"
  • data.first_measurements: A dictionary with keys "time" and `"percentage", which each are a list of floats
  • data.second_measurements: A dictionary with keys "time" and "percentage", which each are a list of floats

I skipped generating stat_files as you did not show what Graph() does, but just created a list of dummy data.

If data.first_measurements and data.second_measurements are also dataframes, let me know and there is an even nicer solution.

Theory - Behind the curtains

Each matplotlib plot (line, bar, etc.) lives on a matplotlib.axes.Axes element. These are like regular axes of a coordinate system. Now two things happen here:

  • When you use plt.plot(), there are no axes specified and thus, matplotlib looks up the current axes element (in the background), and if there is none, it will create an empty one and use it, and set is as default. The second call to plt.plot() then finds these axes and uses them.
  • DataFrame.plot() on the other hand, always creates a new axes element if none is given to it (possible through the ax argument)

So in your code, data.servers_df.plot() first creates an axes element behind the curtains (which is then the default), and the two following plt.plot() calls get the default axes and plot onto it - which is why you get two plots instead of one.

Solution

The following solution first creates a dedicated matplotlib.axes.Axes using plt.subplots(). This axis element is then used to draw all lines onto. Note especially the ax=ax in data.server_df.plot(). Note that I changed the display of your markers from o- to o (as we don't want to display a line (-) but only markers (o)). Mock data can be found below

fig, ax = plt.subplots()  # Here we create the axes that all data will plot onto
for i, data in enumerate(stat_files):
    y_column = f'percentage_{i}'  # Make the columns identifiable
    data.servers_df \
        .rename(columns={'percentage': y_column}) \
        .plot(x='time', y=y_column, linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o', color='green')
plt.show()

enter image description here

Mock data

import random

import pandas as pd
import matplotlib.pyplot as plt

# Generation of dummy data
random.seed(1)
NUMBER_OF_DATA_FILES = 2
X_LENGTH = 10


class Data:
    def __init__(self):
        self.servers_df = pd.DataFrame(
            {
                'time': range(X_LENGTH),
                'percentage': [random.randint(0, 10) for _ in range(X_LENGTH)]
            }
        )
        self.first_measurement = {
            'time': self.servers_df['time'].values[:X_LENGTH // 2],
            'percentage': self.servers_df['percentage'].values[:X_LENGTH // 2]
        }
        self.second_measurement = {
            'time': self.servers_df['time'].values[X_LENGTH // 2:],
            'percentage': self.servers_df['percentage'].values[X_LENGTH // 2:]
        }


stat_files = [Data() for _ in range(NUMBER_OF_DATA_FILES)]
Sign up to request clarification or add additional context in comments.

Comments

3

DataFrame.plot() by default returns a matplotlib.axes.Axes object. You should then plot the other two plots on this object:

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    ax = data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line')
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

If you want to plot them one on top of the others with different colors you can do something like this:

colors = ['C0', 'C1', 'C2']  # matplotlib default color palette
                             # assuming that len(stats_files) = 3
                             # if not you need to specify as many colors as necessary 

ax = plt.subplot(111)
for stats_file, c in zip(stats_files, colors):
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color=c)
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

This just changes the color of the servers_df.plot. If you want to change the color of the other two you can just to the same logic: create a list of colors that you want them to take at each iteration, iterate over that list and pass the color value to the color param at each iteration.

3 Comments

Thanks for your reply. If I would use this part of code, I would get 3 graphs: 2 empty(which I assume that would be the subplots) and one plot which will belong to the second file. Any idea why that would happen?
If I would add ax=ax in ax = data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax), I would get the two graphs, but one under another. I would like to overlap them using different colors.
I changed it a bit, see if it helps now :)
1

You can create an Axes object for plotting in the first place, for example

import pandas as pd
import numpy as np 
from matplotlib import pyplot as plt 


df_one = pd.DataFrame({'a':np.linspace(1,10,10),'b':np.linspace(1,10,10)})
df_two = pd.DataFrame({'a':np.random.randint(0,20,10),'b':np.random.randint(0,5,10)})

dfs = [df_one,df_two]
fig,ax = plt.subplots(figsize=(8,6))

colors = ['navy','darkviolet']
markers = ['x','o']
for ind,item in enumerate(dfs):
    ax.plot(item['a'],item['b'],c=colors[ind],marker=markers[ind])

as you can see, in the same ax, the two dataframes are plotted with different colors and markers.

output

Comments

1

You need to create the plot before. Afterwards, you can explicitly refer to this plot while plotting the graphs. df.plot(..., ax=ax) or ax.plot(x, y)

import matplotlib.pyplot as plt

(fig, ax) = plt.subplots(figsize=(20,5))

for stats_file in stats_files:
    data = Graph(stats_file)
    Graph.compute(data)
    data.servers_df.plot(x="time", y="percentage", linewidth=1, kind='line', ax=ax)
    ax.plot(data.first_measurement['time'], data.first_measurement['percentage'], 'o-', color='orange')
    ax.plot(data.second_measurement['time'], data.second_measurement['percentage'], 'o-', color='green')
plt.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.