0

I have a CSV file of weights taken everyday for six months (August 2016 - January 2017) for every day. I would like to plot a boxplot for each month that basically plots the summary() of the data for each month. I would like to use ggplot2 for it, since it looks much prettier. I've fished around for a solution and come up with many but nothing that seems to solve what I want.

The head and summary of the data:

> wts <- read.csv('weights.csv', header=T, sep=',')
> head(wts)
  August.2016 September.2016 October.2016 November.2016 December.2016 January.2016
1       254.2          250.0        248.2         245.8         245.6        244.4
2       252.6          249.2        248.6         246.4         246.0        245.0
3       251.8          250.6        249.2         248.0         246.4        244.3
4       253.2          252.4        249.8         247.5         246.0        243.6
5       252.2          250.6        248.8         247.0         246.0        242.6
6       254.0          251.0        247.8         247.6         246.0        242.0
> summary(wts)
  August.2016    September.2016   October.2016   November.2016   December.2016    January.2016  
 Min.   :249.6   Min.   :245.6   Min.   :245.4   Min.   :244.2   Min.   :243.4   Min.   :241.6  
 1st Qu.:252.2   1st Qu.:248.3   1st Qu.:246.7   1st Qu.:246.2   1st Qu.:244.8   1st Qu.:242.9  
 Median :252.8   Median :249.2   Median :247.8   Median :246.6   Median :245.6   Median :243.6  
 Mean   :252.7   Mean   :249.1   Mean   :247.6   Mean   :246.7   Mean   :245.3   Mean   :243.5  
 3rd Qu.:253.6   3rd Qu.:250.0   3rd Qu.:248.2   3rd Qu.:247.2   3rd Qu.:246.0   3rd Qu.:244.3  
 Max.   :255.2   Max.   :252.4   Max.   :249.8   Max.   :248.6   Max.   :247.0   Max.   :245.0  
                 NA's   :1                       NA's   :1                       NA's   :1  

From what I've gathered I need to reshape the data in way that ggplot likes, but I'm not sure how to do it. I would also, like highlight the mean (with the actual number) on the boxplot if it is possible. Could I get an idea on how to do it?

Thanks

1 Answer 1

2

To stay in the same paradigm, you can use gather() from tidyr package to reshape your data into a long format, and plug the result into ggplot(). To add text depicting the mean, you can use stat_summary() with the "text" geom and the mean function applied to the value variable.

library(tidyr)
library(ggplot2)

ggplot(gather(wts, factor_key = TRUE), 
   aes(key, value)) + 
    geom_boxplot() + 
    stat_summary(aes(label = ..y..), 
                 fun.y = function(x) round(mean(x), 2), 
                 geom = "text", 
                 size = 3,
                 color = "red")

enter image description here

Sign up to request clarification or add additional context in comments.

6 Comments

Thank you. I received a warning about removing 3 rows containing non-finite values. Would this correspond to the NA values for months with only 30 days (there is no 31st day)? Also, the timeline is not ordered. In your example also, December 2016 is followed by August. Is there something like an ordered method that I could use?
I figured out the ordering. Order can be preserved with gather(factor_key = TRUE). I also filled up the NA data with the mean column weight (in the csv itself, not in R). I need to figure out how to round the mean to two significant digits.
Thank you! What does the label=..y.. do?
It's an internal variable that here represents the computed y aesthetic, which will be inherited by geom_text() to display the rounded mean.
Is it possible to format the text (as in make it bold or itaclis)?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.