0

I have defined a function for me to analyse my columns with boxplots.

    fig, ax = plt.subplots((len(list_of_columns)),1,figsize= datafigsize) 
    fig.suptitle(suptitle,fontsize=30)
    ax = ax.ravel() # Ravel turns a matrix into a vector, which is easier to iterate
    plt.tight_layout(h_pad = 3,pad=10);
    
    for i, column in enumerate(list_of_columns): 
        nobs = dataframe[column].value_counts().values
        nobs = [str(y) for y in nobs.tolist()]
        nobs = ["n: " + j for j in nobs]   
        pos = range(len(nobs))
        medians = dataframe.groupby([column])['saleprice'].median().values
        for tick,label in zip(pos,ax[i].get_xticklabels()):                                   
            ax[i].text(pos[tick], medians[tick] + 0.03, nobs[tick],
                    horizontalalignment='center', size='small', color='k', weight='semibold')
            sns.boxplot(data = dataframe, 
                        x= dataframe[column], 
                        y='saleprice',
                        ax=ax[i]) 
            ax[i].set_title(list_of_titles[i],fontdict={'fontsize': 15})
            ax[i].xaxis.set_visible(True);

Subplot works fine. My numbers of observations are plotted as well.

However, the number of observations can only be plotted on 6 categories. Here is an example:

Only shows n = # for 6 categories. Only shows n = # for 6 categories.

1 Answer 1

1

Most likely you have some other objects in the environment which is causing the trouble. Also you placed the sns.boxplot inside the wrong for loop.

If I set up using an example dataset:

import pandas as pd
import seaborn as sns
import numpy as np
import string
import matplotlib.pyplot as plt

Vars = [i for i in string.ascii_letters]
np.random.seed(111)
dataframe = pd.DataFrame({'saleprice':np.random.uniform(0,100,100),
                          'var1':np.random.choice(Vars[0:5],100),
                          'var2':np.random.choice(Vars[5:12],100),
                         'var3':np.random.choice(Vars[12:21],100)})

list_of_columns = ['var1','var2','var3']

You can see below, I modified the script slightly, calculating the median and number of observations inside a data.frame. Also make sure that the plotted order and the order of your counts are the same (I used the index of the groupby dataframe as a reference below):

for i, column in enumerate(list_of_columns): 
    stats_df = dataframe.groupby(column)['saleprice'].agg(median=np.median,n=len)
    stats_df = stats_df.sort_values('median')
    sns.boxplot(data = dataframe, x= column,y='saleprice',ax=ax[i],order=stats_df.index)
    ax[i].set_title(list_of_columns[i],fontdict={'fontsize': 15})
    
    for xpos in range(len(stats_df)):
        label = "n= "+str(stats_df['n'][xpos])
        ypos = stats_df['median'][xpos] + 0.03
        ax[i].text(xpos,ypos,label,horizontalalignment='center', size='small')

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Hi Stupidwolf, thanks for your answer! I placed your codes into my notebook and I get an error ' KeyError: 0 '. Strange, because I did exactly the same as yours! Also, any idea how to order the boxplots in ascending order?
you can sort the data frame stats_df according to the column of interest, see edited answer.
Not very sure why you get the error. Did you use the example dataset?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.