0

I have a situation where I have a series of data, with some values missing in the middle. Like below:

X,Y
1,10
1,11
1,9
1,5
1,14
1,10
1,12
1,13
3,5
3,10
3,7
3,1
3,3
3,4
3,8
3,-2

If you see the data, 2 is missing in the series.

I wish to plot a box plot or a violin plot where, I can have a placeholder for the 2 series, which would mean no data is present for it.

Right now I can plot by inserting 2 and substituting NaNs and it gives a plot like below:

enter image description here

Is there a better way to plot without manipulating the data, either by use of texts on X Axis or by just having a placeholder?

1 Answer 1

2

You can combine a Categorical and seaborn.boxplot:

import seaborn as sns

df = pd.DataFrame({'X': [1,1,1,1,1,3,3,3,3,3],
                   'Y': [1,2,3,4,5,6,7,8,9,10]
                  })
df['X'] = pd.Categorical(df['X'], categories=[1, 2, 3])

sns.boxplot(data=df, x='X', y='Y')

Output:

enter image description here

annotating the missing categories:

ax = sns.boxplot(data=df, x='X', y='Y')

# positions of the categories in the X-axis
cats = {c: i for i,c in enumerate(df['X'].cat.categories)}
missing = set(df['X'].cat.categories)-set(df['X'])
# {2}

# mid-point of the Y-axis
y_pos = np.mean(ax.get_ylim())

for x in missing:
    ax.annotate('N/A', (cats[x], y_pos), ha='center')

Output:

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @mozway. This is helpful. Is there anyway to mark the 2, like no data is available, like a text or label there?
Not automatically, but you can manually use annotate

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.