3

I'm sure this question has been asked before, but I'm having trouble finding a solution that works:

I have a data frame comprising two groups of 5 samples each, where each sample has ten observations spaced equally across time. I would like to plot this dataset as a time series with two lines linking the average of each group at each time point. At each time point I would like to have some measure of variability (e.g. 95% confidence interval).

For example, the data set is:

group_a <- data.frame(runif(50, min=80, max=100), 1:10, rep("a", 10), c(rep("i", 10), rep("ii", 10), rep("iii", 10), rep("iv", 10), rep("v", 10)))

names(group_a) <- c("yvar", "xvar", "group", "sample")

group_b <- data.frame(runif(50, min=60, max=80), 1:10, rep("b", 10), c(rep("vi", 10), rep("vii", 10), rep("viii", 10), rep("ix", 10), rep("x", 10)))

names(group_b) <- c("yvar", "xvar", "group", "sample")

sample_data <- rbind(group_a, group_b)

So each time point (xvar) has 10 cases (sample) of observations (yvar), split equally into two groups (group). The closest I have come to the answer I'm looking for is by the following:

require(ggplot2)

p <- ggplot(sample_data, aes(x = xvar, y = yvar)) + geom_line(aes(color = group, linetype = group))

print(p)

Which produces something like:

So the line is split by group, but at each time point it follows each individual case vertically, rather than as a mean.

What I'm looking for is something more like what's suggested in this other answer: Plot time series with ggplot with confidence interval, but with multiple lines on the graph, and not necessarily a continuous ribbon plot.

Does anyone have any suggestions? I know this should be really simple, but I'm relatively new to R and ggplot and apparently can't find the right search terms (or am missing something really obvious). Any help is very much appreciated!

3 Answers 3

3

Here are two variations. I'd recommend pre-calculating your summary stats and feeding that into ggplot.

sample_sum <- sample_data %>%
  group_by(xvar, group) %>%
  summarize(mean = mean(yvar),
            sd   = sd(yvar),
            mean_p2sd = mean + 2 * sd,
            mean_m2sd = mean - 2 * sd) %>%
  ungroup()

This first approach gathers mean, mean minus 2 SD, and mean plus 2 SD into the same columns, with "stat" marking which stat it is, and yvar storing the value. (I picked those because +/- 2 SD captures ~95% of a normal distribution.) Then we can plot them together in a single call to geom_line.

p <- ggplot(sample_sum %>%
              gather(stat, yvar, mean, mean_p2sd:mean_m2sd), 
            aes(x = xvar, y = yvar)) + 
  geom_line(aes(color = group, linetype = stat))
p

enter image description here

Alternatively, we can keep them apart and plot the +/- 2 SD area using geom_ribbon.

p <- ggplot(sample_sum, aes(x = xvar,  color = group, fill = group)) + 
  geom_ribbon(aes(ymin = mean_m2sd, ymax = mean_p2sd), alpha = 0.1) +
  geom_line(aes(y= mean))

p

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

This was very helpful. Thank you!
1

Instead of using rep(), you can implement gl() function for indicating each sample. I think it can simplify your columns.

Here, use gl(n = 10, k = 1, length = 50, labels = 1:10). Then the factor with labels = 1:10 is made as

#> [1] 1  2  3  4  5  6  7  8  9  10 1  2  3  4  5 
#> [16] 6  7  8  9  10 1  2  3  4  5  6  7  8  9  10
#> [31] 1  2  3  4  5  6  7  8  9  10 1  2  3  4  5 
#> [46] 6  7  8  9  10
#> Levels: 1 2 3 4 5 6 7 8 9 10

Just adding this to yvar, the problem can be solved.

library(tidyverse)

set.seed(10)
(group_a <-
  data_frame(
    yvar = runif(50, min = 80, max = 100),
    gl = gl(n = 10, k = 1, length = 50, labels = 1:10)
  ))
#> # A tibble: 50 x 2
#>     yvar gl   
#>    <dbl> <fct>
#>  1  90.1 1    
#>  2  86.1 2    
#>  3  88.5 3    
#>  4  93.9 4    
#>  5  81.7 5    
#>  6  84.5 6    
#>  7  85.5 7    
#>  8  85.4 8    
#>  9  92.3 9    
#> 10  88.6 10   
#> # ... with 40 more rows

(group_a_mean <-
  group_a %>%
  group_by(gl) %>% # for each group, calculate mean, standard deviation
  summarise(sample_mean = mean(yvar),
            lower = sample_mean - 1.96 * sd(yvar), # lower CI
            upper = sample_mean + 1.96 * sd(yvar))) # upper CI
#> # A tibble: 10 x 4
#>    gl    sample_mean lower upper
#>    <fct>       <dbl> <dbl> <dbl>
#>  1 1            91.3  82.9  99.8
#>  2 2            87.2  78.5  96.0
#>  3 3            86.0  74.0  98.0
#>  4 4            93.1  85.3 101. 
#>  5 5            86.1  80.6  91.6
#>  6 6            89.1  78.5  99.6
#>  7 7            88.0  72.2 104. 
#>  8 8            88.9  77.0 101. 
#>  9 9            90.3  79.8 101. 
#> 10 10           91.7  83.1 100.

same for group_b

(group_b <-
  data_frame(
    yvar = runif(50, min = 60, max = 80),
    gl = gl(n = 10, k = 1, length = 50, labels = 1:10)
  ))
#> # A tibble: 50 x 2
#>     yvar gl   
#>    <dbl> <fct>
#>  1  67.1 1    
#>  2  78.7 2    
#>  3  64.9 3    
#>  4  69.5 4    
#>  5  63.8 5    
#>  6  71.7 6    
#>  7  69.2 7    
#>  8  69.3 8    
#>  9  68.0 9    
#> 10  70.1 10   
#> # ... with 40 more rows

group_b_mean <-
  group_b %>%
  group_by(gl) %>%
  summarise(sample_mean = mean(yvar),
            lower = sample_mean - 1.96 * sd(yvar),
            upper = sample_mean + 1.96 * sd(yvar))

After that, if two data frame is binded with each group idicator such as "a" and "b", you can draw what you want.

group_a_mean %>%
  mutate(gr = "a") %>% # "a" indicator
  bind_rows(group_b_mean %>% mutate(gr = "b")) %>% # "b" indicator and bind row
  ggplot() +
  aes(x = as.numeric(gl), colour = gr) + # since gl variable is factor, you should conduct as.numeric()
  geom_line(aes(y = sample_mean)) +
  geom_line(aes(y = lower), linetype = "dashed") +
  geom_line(aes(y = upper), linetype = "dashed")

enter image description here

You can also use geom_ribbon():

group_a_mean %>%
  mutate(gr = "a") %>%
  bind_rows(group_b_mean %>% mutate(gr = "b")) %>%
  ggplot() +
  aes(x = as.numeric(gl), colour = gr) +
  geom_ribbon(aes(ymin = lower, ymax = upper, fill = gr), alpha = .3) +
  geom_line(aes(y = sample_mean))

enter image description here

1 Comment

Thanks for the thorough and helpful reply!
0

I think you want it like this:

p <- ggplot(sample_data, aes(x = xvar, y = yvar, shape = sample)) + 
geom_line(aes(color = group, linetype = sample))
print(p)

enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.