Boxplot of CSV data with ggplot2

Question

I have a CSV file of weights taken everyday for six months (August 2016 - January 2017) for every day. I would like to plot a boxplot for each month that basically plots the summary() of the data for each month. I would like to use ggplot2 for it, since it looks much prettier. I've fished around for a solution and come up with many but nothing that seems to solve what I want.

The head and summary of the data:

> wts <- read.csv('weights.csv', header=T, sep=',')
> head(wts)
  August.2016 September.2016 October.2016 November.2016 December.2016 January.2016
1       254.2          250.0        248.2         245.8         245.6        244.4
2       252.6          249.2        248.6         246.4         246.0        245.0
3       251.8          250.6        249.2         248.0         246.4        244.3
4       253.2          252.4        249.8         247.5         246.0        243.6
5       252.2          250.6        248.8         247.0         246.0        242.6
6       254.0          251.0        247.8         247.6         246.0        242.0
> summary(wts)
  August.2016    September.2016   October.2016   November.2016   December.2016    January.2016  
 Min.   :249.6   Min.   :245.6   Min.   :245.4   Min.   :244.2   Min.   :243.4   Min.   :241.6  
 1st Qu.:252.2   1st Qu.:248.3   1st Qu.:246.7   1st Qu.:246.2   1st Qu.:244.8   1st Qu.:242.9  
 Median :252.8   Median :249.2   Median :247.8   Median :246.6   Median :245.6   Median :243.6  
 Mean   :252.7   Mean   :249.1   Mean   :247.6   Mean   :246.7   Mean   :245.3   Mean   :243.5  
 3rd Qu.:253.6   3rd Qu.:250.0   3rd Qu.:248.2   3rd Qu.:247.2   3rd Qu.:246.0   3rd Qu.:244.3  
 Max.   :255.2   Max.   :252.4   Max.   :249.8   Max.   :248.6   Max.   :247.0   Max.   :245.0  
                 NA's   :1                       NA's   :1                       NA's   :1

From what I've gathered I need to reshape the data in way that ggplot likes, but I'm not sure how to do it. I would also, like highlight the mean (with the actual number) on the boxplot if it is possible. Could I get an idea on how to do it?

Thanks

mtoto · Accepted Answer · 2017-02-11 19:36:17Z

2

To stay in the same paradigm, you can use gather() from tidyr package to reshape your data into a long format, and plug the result into ggplot(). To add text depicting the mean, you can use stat_summary() with the "text" geom and the mean function applied to the value variable.

library(tidyr)
library(ggplot2)

ggplot(gather(wts, factor_key = TRUE), 
   aes(key, value)) + 
    geom_boxplot() + 
    stat_summary(aes(label = ..y..), 
                 fun.y = function(x) round(mean(x), 2), 
                 geom = "text", 
                 size = 3,
                 color = "red")

edited Feb 11, 2017 at 19:36

answered Feb 11, 2017 at 17:13

mtoto

24.3k4 gold badges62 silver badges74 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

shaun Over a year ago

Thank you. I received a warning about removing 3 rows containing non-finite values. Would this correspond to the NA values for months with only 30 days (there is no 31st day)? Also, the timeline is not ordered. In your example also, December 2016 is followed by August. Is there something like an ordered method that I could use?

shaun Over a year ago

I figured out the ordering. Order can be preserved with gather(factor_key = TRUE). I also filled up the NA data with the mean column weight (in the csv itself, not in R). I need to figure out how to round the mean to two significant digits.

shaun Over a year ago

Thank you! What does the label=..y.. do?

mtoto Over a year ago

It's an internal variable that here represents the computed y aesthetic, which will be inherited by geom_text() to display the rounded mean.

shaun Over a year ago

Is it possible to format the text (as in make it bold or itaclis)?

|

Collectives™ on Stack Overflow

Boxplot of CSV data with ggplot2

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related