Correct way to add multiple traces to subplots - Plotly, Python

Question

I'm have two very large dataframes that are identical in size df and df2. One is raw data with the other being filtered. I'm trying to produce 36 subplots with each cell containing the raw and filtered data, and have tried this;

plot_rows = 6
plot_cols = 6
fig = make_subplots(rows=plot_rows, cols=plot_cols)

x = 0
for i in range(1, plot_rows + 1):
    for j in range(1, plot_cols + 1):
        fig.add_trace(go.Scattergl(x=df.index, y=df[df.columns[x]].values,
                                 name = df.columns[x],
                                 mode = 'lines'),
                      row=i,
                      col=j)
        fig.add_trace(go.Scattergl(x=df2.index, y=df2[df2.columns[x]].values,
                                 name = df2.columns[x],
                                 mode = 'lines'),
                      row=i,
                      col=j)
        x = x+1


fig.show()

The process finishes without error and a window is opened, however it is blank with no charts at all. I've also tried to replace;

        fig.add_trace(go.Scattergl(x=df2.index, y=df2[df2.columns[x]].values,
                                 name = df2.columns[x],
                                 mode = 'lines'),
                      row=i,
                      col=j)

With;

        fig.append_trace(go.Scattergl(x=df2.index, y=df2[df2.columns[x]].values,
                                 name = df2.columns[x],
                                 mode = 'lines'),
                      row=i,
                      col=j)

Any help or guidance is really appreciated.

a few things, why fig.show() for each trace when there is only one figure? very large data frames, also iloc[] would be more efficient. very large 5M+ records? not surprised it's not working, it's not a suitable approach to very large data sets. putting data into memory multiple times — Rob Raymond
– Rob Raymond, Commented Sep 21, 2021 at 21:37
Fig.show() was a copying error, sorry. I'm fairly new to python so I'll need to look into iloc[], but this works well for plotting one of them but when I try to do both it's no not throwing errors yet producing an empty window. Is this what I should expect to see if it's a memory related issue? As for the data, I have roughly 350K x 39 sized dataframes. — Iceberg_Slim
– Iceberg_Slim, Commented Sep 21, 2021 at 21:44

Rob Raymond · Accepted Answer · 2021-09-22 08:18:08Z

1

you have noted large data frames (39 columns, 350k rows)
plotly express provides higher level API for faceted figures (sub-plots). This is simpler to use
shape data frames to make it simple to use with plotly express
1. make long dataframe instead of wide by unstack()
2. values from the index become sub-plot and x-axis
3. pd.concat() two data frames together
4. there is far too much data to go into a figure, sample it down selecting every 100th row from source data frames

import numpy as np
import pandas as pd
import plotly.express as px

N = 350 * 10**3
C = 39
# generate a dataset same size as indicated in question
df = pd.DataFrame({c: np.random.uniform(1, 5, N)
                   for c in [f"{'' if (c//26)==0 else chr((c//26)+64)}{chr((c%26)+65)}" for c in range(C)]
                  })
# second data frame, same shape different values
df2 = pd.DataFrame(df.values * np.random.uniform(0.4, 0.6, df.values.shape), columns=df.columns)

# generating a figure with so much data in it will cause issues.  Plot sampled data, 100 data points
# use plotly express to simplify generation of sub-plots
fig = px.line(
    pd.concat(
        [
            df.unstack().reset_index().assign(status="clean"),
            df2.unstack().reset_index().assign(status="raw"),
        ]
    ).loc[lambda d: (d["level_1"]%(N//100)).eq(0)],
    x="level_1",
    y=0,
    facet_col="level_0",
    facet_col_wrap=6,
    color="status",
)
fig

answered Sep 22, 2021 at 8:18

Rob Raymond

31.5k3 gold badges19 silver badges34 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Iceberg_Slim Over a year ago

Thank you for this, unfortunatley I require the whole dataset to be plotted as it contains important spikes that last only a few datapoints. I guess the next best option would be to break it down into multiple figures to prevent a memory issue.

Rob Raymond Over a year ago

that amount of data causes by python kernel to crash... it would be far smarter to look for spikes in the data and plot regions with spikes. a bit more work in pandas

Iceberg_Slim Over a year ago

Kernal crash, wow! I don't get anything like that on my end. It might be a more programmatically efficient method to search out spikes in data by way of code, I'd like to be able to do this, but for my purposes the whole dataset needs to be plotted. Interesting to note that if I arrange 12 subplots per figure, it works perfectly with little/no lag! Scattergl seems good with big datasets.

Collectives™ on Stack Overflow

Correct way to add multiple traces to subplots - Plotly, Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related