1

I am trying to create a scatterplot matrix in which the x and y axes of variables are not the same (and the number of variables is not the same either)… For example, I'd like three inputs plotted along the x axis and 2 outputs plotted along the y axis, and therefor a scatterplot matrix of 6 scatter plots showing one input vs one output

I have not found a way to do this in matplotlib, seaborn, pandas, or plotly. Has anyone ever done something like this before or know a clever way to create a plot like this?

Everything I have found so far plots the same n number of variables against themselves for n^2 number of plots

Code:

import pandas as pd
import seaborn as sns
import plotly.express as px

headings = ['a','b','c','d','e']

data = [[21,22,23,24,25],[10,12,13,14,15],[14,2,3,17,5],[6,17,22,9,14],[16,17,18,19,20]]

df = pd.DataFrame(data=data, columns=headings)
pd.plotting.scatter_matrix(df)

sns.pairplot(df)

fig = px.scatter_matrix(df)
fig.show()

Output:

enter image description here

4
  • Could you show us what you have already tried? Commented Oct 28, 2019 at 20:12
  • @AlexR I've have edited my original question with the code I've tried so far (sorry about the formatting)… For this example say a,b,c are inputs and d,e are outputs... I would like to have a scatter matrix of just a,b,c vs d,e (the bottom right corner of the square scatter matrix created by the code below. But for the real data set that I am trying to apply this to, it would take much too long to plot the full square matrix Commented Oct 29, 2019 at 11:31
  • @stsandoval How did my suggestion work out for you? Commented Feb 4, 2021 at 23:04
  • Seaborn is built on top of matplotlib. Matplotlib figures have to be converted by plotly. Why would you mix those two eco-systems? Commented Feb 5, 2021 at 9:03

2 Answers 2

2

The following setup will let you chose dependent and independent variables for an array of scatter plots. If this thematically is what your'e looking for, I can adjust the setup as a 2x3 matrix if you prefer that. I could also add regression line to each subplot.

Plots:

enter image description here

enter image description here

Code:

# imports
from plotly.subplots import make_subplots
import plotly.graph_objs as go
import pandas as pd
import numpy as np

# data
np.random.seed(123)
frame_rows = 15

n_cols = 5

frame_columns = ['V_'+str(e) for e in list(range(1,n_cols+1))]
df = pd.DataFrame(np.random.uniform(-8,10,size=(frame_rows, len(frame_columns))),
                  index=pd.date_range('1/1/2020', periods=frame_rows),
                    columns=frame_columns)
df=df.cumsum()+100
df.iloc[0]=100

# define dependent and independent variables
y_list = ['V_1', 'V_2']
x_list = ['V_3', 'V_4', 'V_5']

# plotly
n_plots = len(y_list)*len(x_list)
fig = make_subplots(rows=n_plots, cols=1)

row_count=1
names = []
for y in y_list:
    for x in x_list:

        fig.add_trace(go.Scatter(x=df[x].values, y=df[y].values,                                
                                 mode = 'markers',
                                 ),

                      row=row_count,
                      col=1)

        names.append(y+'=f('+x+')')

        # axis titles
        fig.update_xaxes(title = x, row = row_count)
        fig.update_yaxes(title = y, row = row_count)
        row_count+=1

fig.update_layout(height=n_plots*250, width=600)
fig.show()
Sign up to request clarification or add additional context in comments.

Comments

1

A seaborn solution:

import pandas as pd 
import matplotlib.pyplot as plt
import seaborn as sns

#generate test data
import numpy as np
np.random.seed(123)
n=10
df = pd.DataFrame({"A": np.random.random(n), 
                   "B": 10 * np.random.random(n),
                   "C": 20 * np.random.random(n),
                   "D": -np.random.random(n),
                   "E": np.random.random(n)-20})
 

#prepare df for plotting
df_temp = df.melt(id_vars=["A", "B"], value_vars=["C", "D", "E"], var_name="row_name", value_name="row_vals")
df_plot = df_temp.melt(id_vars=["row_name", "row_vals"], value_vars=["A", "B"], var_name="col_name", value_name="col_vals")

#plot data into a FacetGrid
g = sns.FacetGrid(df_plot, col="col_name", row="row_name", sharex=False, sharey=False)
g.map(sns.scatterplot, "col_vals", "row_vals")
plt.tight_layout()
plt.show()

Sample output: enter image description here

I am not convinced that in this case, seaborn has an advantage over directly plotting row vs column variables into a grid using matplotlib.

2 Comments

Nice one, Mr.T! I'm seeing more and more value in having plotly and matplotlib/seaborn suggestions to the same problem on the same post (+1).
Thanks but I am not convinced by my solution. It looks clumsy and requires post-beautification. But I agree - each library has its advantages, and people should be made more aware of these to choose the right tool for their task.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.