1

I want to create line charts with plotly, my X axis is a year, y a value. Sometimes, the data is incomplete, i.e., some curves will have years missing or will start in different years than others.

It appears this can lead to zig-zag lines where I would expect the lines to be connected in the order the data is provided.

Here's a simple example:

#!/usr/bin/python3
import plotly.express
curve_a = {2010: 20, 2011: 21, 2012: 22, 2014: 22, 2015: 23}
curve_b = {2009: 18, 2010: 21, 2011: 22, 2012: 23, 2013: 21, 2014: 21, 2015: 20}
dat = {"A": curve_a, "B": curve_b}
fig = plotly.express.line(dat, markers=True)
fig.show()

Here's how it looks: screenshot of behavior

As you can see, the red line starts at 2010, and jumps to 2009 and 2013 at the end.

I can workaround this by adding a fake line at the bottom that spans the whole range and is used as the first in the dataset, but that's, of course, not a very good solution.

Any help appreciated.

2 Answers 2

1

The issue is caused by plotly not sorting the dates/years by itself so you get a 'zigzag' line.

Easiest way is to use the sort_index function from pandas to sort your indexes (years):

dat = pd.DataFrame(dat).sort_index()

Gives: enter image description here


Full code:

import plotly.express as px
import pandas as pd

curve_a = {2010: 20, 2011: 21, 2012: 22, 2014: 22, 2015: 23}
curve_b = {2009: 18, 2010: 21, 2011: 22, 2012: 23, 2013: 21, 2014: 21, 2015: 20}
dat = {"A": curve_a, "B": curve_b}
dat = pd.DataFrame(dat).sort_index()

fig = px.line(dat, x=dat.index, y=dat.columns, markers=True)
fig.show()
Sign up to request clarification or add additional context in comments.

Comments

0

You can create an x-axis with the full year-list:

import pandas as pd
import plotly.express as px

curve_a = {2010: 20, 2011: 21, 2012: 22, 2014: 22, 2015: 23}
curve_b = {2009: 18, 2010: 21, 2011: 22, 2012: 23, 2013: 21, 2014: 21, 2015: 20}

# Create DataFrame from both curves
df = pd.DataFrame({'A': curve_a, 'B': curve_b})

# Create a complete year index
all_years = range(min(df.index.min(), df.index.min()), max(df.index.max(), df.index.max()) + 1)
df = df.reindex(all_years)

# Melt for plotly express
df = df.reset_index().melt(id_vars='index', var_name='curve', value_name='value')
df.rename(columns={'index': 'year'}, inplace=True)

fig = px.line(df, x='year', y='value', color='curve', markers=True)
fig.show()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.