2

I've been trying to plot a data frame as a box plot using matplotlib. My data frame looks something like this:

   9-1   9-2   9-3   9-4  9-5
0   23  16.0  18.0  18.0   26
1   27  18.0  20.0  17.0   33
2   10   9.0  15.0   8.0   30
3   23  30.0  19.0   5.0   15
4   10  10.0  23.0  29.0   12
5   50  13.0   8.0  20.0   23
6   12  24.0  31.0  27.0   35
7   10  29.0   NaN   7.0   22
8   34   NaN   NaN  16.0   31
9   28   NaN   NaN   NaN   24

Where every column is a day of a measurement. Below is what my plot looks like. I'd like to be able to plot every day in the box plot, including ones with NaN values. I know that matplotlib only plots the columns where there aren't any NaN values. Would this be possible to do using matplotlib or would I have to convert the data frame into a list of lists or Numpy array? Any help would be appreciated!

I know that there is a function in Pandas where we can put make a box plot using df.boxplot() or df.plot.box(), however, I strongly prefer to use matplotlib's boxplot() function instead. Also, I want to be able to plot the all of the values for days with complete measurements (like 9-1 and 9-5) instead of cutting them off to correspond to all the valid rows in the days with NaN values.

For images of the boxplot, https://i.sstatic.net/pBh6pEdf.png is what I produced and https://imgur.com/a/vvDFNR1 is what I'm trying to create with matplotlib's boxplot() function.

1 Answer 1

3

You could convert the DataFrame to a list of lists (or list of arrays) without NaNs (with dropna) and pass the names separately. I'll do this here using a dictionary:

data = {c: df[c].dropna().tolist() for c in df}
# {'9-1': [23, 27, 10, 23, 10, 50, 12, 10, 34, 28],
#  '9-2': [16.0, 18.0, 9.0, 30.0, 10.0, 13.0, 24.0, 29.0],
#  '9-3': [18.0, 20.0, 15.0, 19.0, 23.0, 8.0, 31.0],
#  '9-4': [18.0, 17.0, 8.0, 5.0, 29.0, 20.0, 27.0, 7.0, 16.0],
#  '9-5': [26, 33, 30, 15, 12, 23, 35, 22, 31, 24]}

plt.boxplot(data.values(), tick_labels=data.keys())

Or, as a one-liner:

plt.boxplot([df[c].dropna() for c in df], tick_labels=list(df))

And for the sake of completeness, this can of course be obtained easily with the pandas API:

df.plot.box()

Output:

matplotlib boxplot with NaNs from pandas DataFrame

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.