Scatterplot Matrix, Input vs Output

Question

I am trying to create a scatterplot matrix in which the x and y axes of variables are not the same (and the number of variables is not the same either)… For example, I'd like three inputs plotted along the x axis and 2 outputs plotted along the y axis, and therefor a scatterplot matrix of 6 scatter plots showing one input vs one output

I have not found a way to do this in matplotlib, seaborn, pandas, or plotly. Has anyone ever done something like this before or know a clever way to create a plot like this?

Everything I have found so far plots the same n number of variables against themselves for n^2 number of plots

Code:

import pandas as pd
import seaborn as sns
import plotly.express as px

headings = ['a','b','c','d','e']

data = [[21,22,23,24,25],[10,12,13,14,15],[14,2,3,17,5],[6,17,22,9,14],[16,17,18,19,20]]

df = pd.DataFrame(data=data, columns=headings)
pd.plotting.scatter_matrix(df)

sns.pairplot(df)

fig = px.scatter_matrix(df)
fig.show()

Output:

@AlexR I've have edited my original question with the code I've tried so far (sorry about the formatting)… For this example say a,b,c are inputs and d,e are outputs... I would like to have a scatter matrix of just a,b,c vs d,e (the bottom right corner of the square scatter matrix created by the code below. But for the real data set that I am trying to apply this to, it would take much too long to plot the full square matrix — stsandoval
– stsandoval, Commented Oct 29, 2019 at 11:31
Seaborn is built on top of matplotlib. Matplotlib figures have to be converted by plotly. Why would you mix those two eco-systems? — Mr. T
– Mr. T, Commented Feb 5, 2021 at 9:03

vestland · Accepted Answer · 2019-11-19 11:44:37Z

The following setup will let you chose dependent and independent variables for an array of scatter plots. If this thematically is what your'e looking for, I can adjust the setup as a 2x3 matrix if you prefer that. I could also add regression line to each subplot.

Plots:

Code:

# imports
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# data
np.random.seed(123)
frame_rows = 15

n_cols = 5

frame_columns = ['V_'+str(e) for e in list(range(1,n_cols+1))]
df = pd.DataFrame(np.random.uniform(-8,10,size=(frame_rows, len(frame_columns))),
                  index=pd.date_range('1/1/2020', periods=frame_rows),
                    columns=frame_columns)
df=df.cumsum()+100
df.iloc[0]=100

# define dependent and independent variables
y_list = ['V_1', 'V_2']
x_list = ['V_3', 'V_4', 'V_5']

# plotly
n_plots = len(y_list)*len(x_list)
fig = make_subplots(rows=n_plots, cols=1)

row_count=1
names = []
for y in y_list:
    for x in x_list:

        fig.add_trace(go.Scatter(x=df[x].values, y=df[y].values,                                
                                 mode = 'markers',
                                 ),

                      row=row_count,
                      col=1)

        names.append(y+'=f('+x+')')

        # axis titles
        fig.update_xaxes(title = x, row = row_count)
        fig.update_yaxes(title = y, row = row_count)
        row_count+=1

fig.update_layout(height=n_plots*250, width=600)
fig.show()

Mr. T · Accepted Answer · 2021-02-05 09:56:43Z

1

A seaborn solution:

import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

#generate test data
import numpy as np
np.random.seed(123)
n=10
df = pd.DataFrame({"A": np.random.random(n), 
                   "B": 10 * np.random.random(n),
                   "C": 20 * np.random.random(n),
                   "D": -np.random.random(n),
                   "E": np.random.random(n)-20})
 

#prepare df for plotting
df_temp = df.melt(id_vars=["A", "B"], value_vars=["C", "D", "E"], var_name="row_name", value_name="row_vals")
df_plot = df_temp.melt(id_vars=["row_name", "row_vals"], value_vars=["A", "B"], var_name="col_name", value_name="col_vals")

#plot data into a FacetGrid
g = sns.FacetGrid(df_plot, col="col_name", row="row_name", sharex=False, sharey=False)
g.map(sns.scatterplot, "col_vals", "row_vals")
plt.tight_layout()
plt.show()

Sample output:

I am not convinced that in this case, seaborn has an advantage over directly plotting row vs column variables into a grid using matplotlib.

answered Feb 5, 2021 at 9:56

Mr. T

12.5k10 gold badges39 silver badges67 bronze badges

2 Comments

vestland Over a year ago

Nice one, Mr.T! I'm seeing more and more value in having plotly and matplotlib/seaborn suggestions to the same problem on the same post (+1).

Mr. T Over a year ago

Thanks but I am not convinced by my solution. It looks clumsy and requires post-beautification. But I agree - each library has its advantages, and people should be made more aware of these to choose the right tool for their task.

Collectives™ on Stack Overflow

Scatterplot Matrix, Input vs Output

2 Answers 2

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related