Order of plotting in Pandas.plotting.parallel_coordinates

Question

I have a series of measurements I want to plot as pandas.plotting.parallel_coordinates, where the color of the individual line is given by the value of one pandas.column.

Code looks like this:

... data retrieval and praparation from a couple of Excel files
---> output = 'largeDataFrame'

theColormap: ListedColormap = cm.get_cmap('some cmap name')

# This is a try to stack the lines in the right order.. (doesn't work)
largeDataFrames.sort_values(column_for_line_color_derivation, inplace=True, ascending=True)

# here comes the actual plotting of data
sns.set_style('ticks')
sns.set_context('paper')
plt.figure(figsize=(10, 6))
thePlot: plt.Axes = parallel_coordinates(largeDataFrame, class_column=column_for_line_color_derivation, cols=[columns to plot], color=theColormap.colors)
plt.title('My Title')
thePlot.get_legend().remove()
plt.xticks(rotation=90)
plt.tight_layout()
plt.show()

This works quite well and yields the following result:

Now I would like to have the yellow lines (high values of "column_for_line_color_derivation") plotted in front of the green and darker lines, so they become more prominent. In other words, I want to influence the order of stacking the lines by values of "column_for_line_color_derivation". Up to now I didn't find a way to do that.

My guess is that they're probably drawn in the order they appear in your dataframe. Did you try sorting your dataframe by on the column in question? — Paul H
– Paul H, Commented Oct 5, 2020 at 17:27
Hi Paul, yes I did that - tried to indicate this by the line "allFrames.sort_values(column_for_line_color_derivation, inplace=True, ascending=True)" — WolfiG
– WolfiG, Commented Oct 5, 2020 at 17:28
so why did you plot largeDataFrame instead of allFrames? — Paul H
– Paul H, Commented Oct 5, 2020 at 17:29
Sorry - and thanks - this was wring in the problem description — WolfiG
– WolfiG, Commented Oct 5, 2020 at 17:36
You sort with ascending=True, meaning the smallest values are moved to the first rows of the dataframe. Try ascending=False — Paul H
– Paul H, Commented Oct 5, 2020 at 17:37

JohanC · Accepted Answer · 2020-10-05 18:56:32Z

1

I ran some tests with the pandas versions 1.1.2 and 1.0.3 and in both cases the lines are drawn from low to high value of the coloring column, independent of the dataframe order.

You can temporarily add parallel_coordinates(...., lw=5) which makes it very clear. With thin lines, the order is less visible, as the yellow lines have less contrast.

The parameter sort_labels= seems to have the opposite effect of its name: when False (default), the lines are drawn in sorted order, when True, they keep the dataframe order.

Here is a small reproducible example:

import numpy as np
import pandas as pd
from pandas.plotting import parallel_coordinates
import matplotlib.pyplot as plt

df = pd.DataFrame({ch: np.random.randn(100) for ch in 'abcde'})
df['coloring'] = np.random.randn(len(df))

fig, axes = plt.subplots(ncols=2, figsize=(14, 6))
for ax, lw in zip(axes, [1, 5]):
    parallel_coordinates(df, class_column='coloring', cols=df.columns[:-1], colormap='viridis', ax=ax, lw=lw)
    ax.set_title(f'linewidth={lw}')
    ax.get_legend().remove()
plt.show()

An idea is to change the linewidth depending on the class:

fig, ax = plt.subplots(figsize=(8, 6))

parallel_coordinates(df, class_column='coloring', cols=df.columns[:-1], colormap='viridis', ax=ax)
num_lines = len(ax.lines)
for ind, line in enumerate(ax.lines):
    xs = line.get_xdata()
    if xs[0] != xs[-1]:  # skip the vertical lines representing axes
        line.set_linewidth(1 + 3 * ind / num_lines)
ax.set_title(f'linewidth depending on class_column')
ax.get_legend().remove()
plt.show()

edited Oct 5, 2020 at 18:56

answered Oct 5, 2020 at 18:26

JohanC

81.4k8 gold badges54 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

WolfiG Over a year ago

Hi Johan, thanks to your post, I found the error: instead of "color=theColormap.colors" I have to use "colormap="whatever map" then the output is as expected.

Collectives™ on Stack Overflow

Order of plotting in Pandas.plotting.parallel_coordinates

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related