2

I have a dataframe where both columns and rows can be considered as categories. I want to plot the values in each row on a scatter plot with row categories on y-axis and column categories with different colored dots, with x-axis as scale for the values. Preferred plot - plotly or seaborn

Simulated data

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0, 100, size=(5, 4)), 
                  columns=list('ABCD'), index=list('PQRST'))
df
#     A   B   C   D
# P  21  95  91  90
# Q  21  12   9  68
# R  24  68  10  82
# S  81  14  80  39
# T  53  17  19  77

# plot
df.plot(marker='o', linestyle='')

Desired plot (similar to the below plot but with x-axis and y-axis switched) enter image description here

2 Answers 2

1

In my opinion, the way you have structured your DataFrame — making the index the categorical y-values and making each column the color — will make it pretty inconvenient for you to access your data for the purposes of plotting.

Instead, I think you can make your life easier by having one column for the values, one column for the categories P, Q, R, S, T, and a final column for the categories A, B, C, D that will correspond to differently colored points.

For data visualization, I would recommend Plotly express, since I think the documentation is excellent, and it's nice that the plots are interactive. For example, there's documentation on setting colors using column names, which I have done in my code below (and is one of the reasons I recommended structuring your DataFrame differently).

import numpy as np
import pandas as pd
import plotly.express as px

np.random.seed(42)

df = pd.DataFrame({
    'value':np.random.randint(0, 100, size=20),
    'category':['P','Q','R','S','T']*4,
    'color':['A','B','C','D']*5
})
df = df.sort_values(by='category')

fig = px.scatter(df, x='value', y='category', color='color')

## make the marker size larger than the default
fig.update_traces(marker=dict(size=14))
fig.show()

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

1

With plotly as the plotting backend for pandas, all you need to do is reshape your dataframe from a wide to long format using pd.melt(), and run:

df.plot(kind='scatter', x='value', y='index', color = 'variable')

enter image description here

Complete code:

import numpy as np
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 4)), 
                  columns=list('ABCD'), index=list('PQRST'))
df=pd.melt(df.reset_index(), id_vars=['index'], value_vars=df.columns)
df.plot(kind='scatter', x='value', y='index', color = 'variable')

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.