1

I have a Pandas dataframe with 4 columns - id, label, var1 and var2.

From this data, I would like to draw two box plots each for var1 and var2 - one for where the label==0 and the other for where label==1.

Sample dataframe:

+-----+-----+----+----+
| id  |label|var1|var2|
+-----+-----+----+----+
| 1   |  0  | 0.7| 0.6|
| 2   |  0  | 0.3| 0.4|
| 3   |  1  | 0.2| 0.2|
| 4   |  1  | 0.8| 0.1|
| 5   |  0  | 0.0| 0.9|
+-----+-----+----+----+

Code to generate dataframe:

l=[(1,0,0.7,0.6),(2,0,0.3,0.4),(3,1,0.2,0.2),(4,1,0.8,0.1),(5,0,0.0,0.9)]
names=["id","label", "var1", "var2"]
db=sqlContext.createDataFrame(l,names)
db.show()
db = db.toPandas()

How can I implement this using matplotlib?

1 Answer 1

4

Just use df.boxplot() on your dataframe.

import pandas as pd
import matplotlib.pyplot as plt

l=[(1,0,0.7,0.6),(2,0,0.3,0.4),(3,1,0.2,0.2),(4,1,0.8,0.1),(5,0,0.0,0.9)]
names=["id","label", "var1", "var2"]
df = pd.DataFrame(l, columns=names)

df.boxplot(column=["var1", "var2"],by="label")

plt.show()

enter image description here

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks! Do you know how can I set y-axis limit in matplotlib?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.