3

So I set up this empty dataframe DF and load data into the dataframe according to some conditions. As such, some its elements would then be empty (nan). I noticed that if I don't specify the datatype as float when I create the empty dataframe, DF.boxplot() will give me an 'Index out of range' error.

As I understand it, pandas' DF.boxplot() uses matplotlib's plt.boxplot() function, so naturally I tried using plt.boxplot(DF.iloc[:,0]) to plot the boxplot of the first column. I noticed a reversed behavior: When dtype of DF is float, it will not work: it will just show me an empty plot. See the code below where DF.boxplot() wont work, but plt.boxplot(DF.iloc[:,0]) will plot a boxplot (when i add dtype='float' when first creating the dataframe, plt.boxplot(DF.iloc[:,0]) will give me an empty plot):

import numpy as np
import pandas as pd

DF=pd.DataFrame(index=range(10),columns=range(4))
for i in range(10):
    for j in range(4):
        if i==j:
         continue
        DF.iloc[i,j]=i

I am wondering does this has to do with how plt.boxplot() handles nan for different data types? If so, why did setting the dataframe's data type as 'object' didn't work for DF.boxplot(), if pandas is just using matplotlib's boxplot function?

1
  • added some codes that reproduce my problem Commented May 26, 2017 at 20:08

1 Answer 1

6

I think we can agree that neither df.boxplot() nor plt.boxplot can handle dataframes of type "object". Instead they need to be of a numeric datatype.

If the data is numeric, df.boxplot() will work as expected, even with nan values, because they are removed before plotting.

import pandas as pd
import matplotlib.pyplot as plt

df=pd.DataFrame(index=range(10),columns=range(4), dtype=float)
for i in range(10):
    for j in range(4):
        if i!=j:
            df.iloc[i,j]=i

df.boxplot()
plt.show()

Using plt.boxplot you would need to remove the nans manually, e.g. using df.dropna().

import pandas as pd
import matplotlib.pyplot as plt

df=pd.DataFrame(index=range(10),columns=range(4), dtype=float)
for i in range(10):
    for j in range(4):
        if i!=j:
            df.iloc[i,j]=i
data = [df[i].dropna() for i in range(4)]
plt.boxplot(data)
plt.show()

To summarize:enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, I just realized even though plt.boxplot() will give me a plot with dtype='object', it will still return an error.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.