2

I want to plot the rolling mean of data of different time series with ggplot2. My data have the following structure:

library(dplyr)
library(ggplot2)
library(zoo)
library(tidyr)

df <- data.frame(episode=seq(1:1000), 
                 t_0 = runif(1000), 
                 t_1 = 1 + runif(1000), 
                 t_2 = 2 + runif(1000))
df.tidy <- gather(df, "time", "value", -episode) %>% 
  separate("time", c("t", "time"), sep = "_") %>%
  subset(select = -t)

> head(df.tidy)
#  episode time     value
#1       1    0 0.7466480
#2       2    0 0.7238865
#3       3    0 0.9024454
#4       4    0 0.7274303
#5       5    0 0.1932375
#6       6    0 0.1826925

Now, the code below creates a plot where the lines for time = 1 and time = 2 towards the beginning of the episodes do not represent the data because value is filled with NAs and the first numeric entry in value is for time = 0.

ggplot(df.tidy, aes(x = episode, y = value, col = time)) +
  geom_point(alpha = 0.2) + 
  geom_line(aes(y = rollmean(value, 10, align = "right", fill = NA)))

Plot result

How do I have to adapt my code such that the rolling-mean lines are representative of my data?

3
  • from the rollmean documentation: Currently, there are methods for "zoo" and "ts" series and default methods. The default method of rollmedian is an interface to runmed. The default methods of rollmean and rollsum do not handle inputs that contain NAs. In such cases, use rollapply instead. Commented Jul 29, 2018 at 11:52
  • You need tidyr in your example Error in gather(df, "time", "value", -episode) : could not find function "gather" Commented Jul 29, 2018 at 11:53
  • 1
    @JackBrookes True. I forgot. I have made an edit accordingly. Commented Jul 29, 2018 at 11:55

1 Answer 1

5

Your issue is you are applying a moving average over the whole column, which makes data "leak" from one value of time to another.

You could group_by first to apply the rollmean to each time separately:

ggplot(df.tidy, aes(x = episode, y = value, col = time)) +
  geom_point(alpha = 0.2) + 
  geom_line(data = df.tidy %>%
              group_by(time) %>%
              mutate(value = rollmean(value, 10, align = "right", fill = NA)))

enter image description here

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. That's exactly what I was looking for. I had the same reasoning for why I get the undesired result, but couldn't come up with a solution. Yours is great.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.