I have a Pandas dataframe with 4 columns - id, label, var1 and var2.
From this data, I would like to draw two box plots each for var1 and var2 - one for where the label==0 and the other for where label==1.
Sample dataframe:
+-----+-----+----+----+
| id |label|var1|var2|
+-----+-----+----+----+
| 1 | 0 | 0.7| 0.6|
| 2 | 0 | 0.3| 0.4|
| 3 | 1 | 0.2| 0.2|
| 4 | 1 | 0.8| 0.1|
| 5 | 0 | 0.0| 0.9|
+-----+-----+----+----+
Code to generate dataframe:
l=[(1,0,0.7,0.6),(2,0,0.3,0.4),(3,1,0.2,0.2),(4,1,0.8,0.1),(5,0,0.0,0.9)]
names=["id","label", "var1", "var2"]
db=sqlContext.createDataFrame(l,names)
db.show()
db = db.toPandas()
How can I implement this using matplotlib?
