R - Plot the rolling mean of different time series in a lineplot with ggplot2

Question

I want to plot the rolling mean of data of different time series with ggplot2. My data have the following structure:

library(dplyr)
library(ggplot2)
library(zoo)
library(tidyr)

df <- data.frame(episode=seq(1:1000), 
                 t_0 = runif(1000), 
                 t_1 = 1 + runif(1000), 
                 t_2 = 2 + runif(1000))
df.tidy <- gather(df, "time", "value", -episode) %>% 
  separate("time", c("t", "time"), sep = "_") %>%
  subset(select = -t)

> head(df.tidy)
#  episode time     value
#1       1    0 0.7466480
#2       2    0 0.7238865
#3       3    0 0.9024454
#4       4    0 0.7274303
#5       5    0 0.1932375
#6       6    0 0.1826925

Now, the code below creates a plot where the lines for time = 1 and time = 2 towards the beginning of the episodes do not represent the data because value is filled with NAs and the first numeric entry in value is for time = 0.

ggplot(df.tidy, aes(x = episode, y = value, col = time)) +
  geom_point(alpha = 0.2) + 
  geom_line(aes(y = rollmean(value, 10, align = "right", fill = NA)))

How do I have to adapt my code such that the rolling-mean lines are representative of my data?

from the rollmean documentation: Currently, there are methods for "zoo" and "ts" series and default methods. The default method of rollmedian is an interface to runmed. The default methods of rollmean and rollsum do not handle inputs that contain NAs. In such cases, use rollapply instead. — tjebo
– tjebo, Commented Jul 29, 2018 at 11:52
You need tidyr in your example Error in gather(df, "time", "value", -episode) : could not find function "gather" — Jack Brookes
– Jack Brookes, Commented Jul 29, 2018 at 11:53
@JackBrookes True. I forgot. I have made an edit accordingly. — apitsch
– apitsch, Commented Jul 29, 2018 at 11:55

Jack Brookes · Accepted Answer · 2018-07-29 11:57:47Z

5

Your issue is you are applying a moving average over the whole column, which makes data "leak" from one value of time to another.

You could group_by first to apply the rollmean to each time separately:

ggplot(df.tidy, aes(x = episode, y = value, col = time)) +
  geom_point(alpha = 0.2) + 
  geom_line(data = df.tidy %>%
              group_by(time) %>%
              mutate(value = rollmean(value, 10, align = "right", fill = NA)))

answered Jul 29, 2018 at 11:57

Jack Brookes

3,8502 gold badges15 silver badges24 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

apitsch Over a year ago

Thanks. That's exactly what I was looking for. I had the same reasoning for why I get the undesired result, but couldn't come up with a solution. Yours is great.

Collectives™ on Stack Overflow

R - Plot the rolling mean of different time series in a lineplot with ggplot2

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related