1

I have the foll. dataframe:

Av_Temp Tot_Precip
278.001 0
274     0.0751864
270.294 0.631634
271.526 0.229285
272.246 0.0652201
273     0.0840059
270.463 0.0602944
269.983 0.103563
268.774 0.0694555
269.529 0.010908
270.062 0.043915
271.982 0.0295718

and want to plot a boxplot where the x-axis is 'Av_Temp' divided into equi-sized bins (say 2 in this case), and the Y-axis shows the corresponding range of values for Tot_Precip. I have the foll. code (thanks to Find pandas quartiles based on another column), however, when I plot the boxplots, they are getting plotted one on top of another. Any suggestions?

expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)
grp_df = df.groupby(expl_var+'_Deciles').apply(lambda x: numpy.array(x[cname]))

fig, ax = plt.subplots()
for i in range(len(grp_df)):
    box_arr = grp_df[i]
    box_arr = box_arr[~numpy.isnan(box_arr)]
    stats = cbook.boxplot_stats(box_arr, labels = str(i))

    ax.bxp(stats)
    ax.set_yscale('log')
plt.show()

enter image description here

1 Answer 1

1

Since you're using pandas already, why not use the boxplot method on dataframes?

expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)

ax = df.boxplot(by='Av_Temp_Deciles', column='Tot_Precip')
ax.set_yscale('log')

That produces this: https://i.sstatic.net/20KPx.png

If you don't like the labels, throw in a

plt.xlabel('');plt.suptitle('');plt.title('')

If you want a standard boxplot, the above should be fine. My understanding of the separation of boxplot into boxplot_stats and bxp is to allow you to modify or replace the stats generated and fed to the plotting routine. See https://github.com/matplotlib/matplotlib/pull/2643 for some details.

If you need to draw a boxplot with non-standard stats, you can use boxplot_stats on 2D numpy arrays, so you only need to call it once. No loops required.

expl_var = 'Av_Temp'
cname = 'Tot_Precip'
df[expl_var+'_Deciles'] = pandas.qcut(df[expl_var], 2)

# I moved your nan check into the df apply function
grp_df = df.groupby('Av_Temp_Deciles').apply(lambda x: numpy.array(x[cname][~numpy.isnan(x[cname])]))

# boxplot_stats can take a 2D numpy array of data, and a 1D array of labels
# stats is now a list of dictionaries of stats, one dictionary per quantile 
stats = cbook.boxplot_stats(grp_df.values, labels=grp_df.index)

# now it's a one-shot plot, no loops
fig, ax = plt.subplots()
ax.bxp(stats)
ax.set_yscale('log')
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.