1

I want to find the most efficient way to create a new variable. Suppose I have this data frame:

set.seed(1234)
df <- data.frame(group = c(rep(1,4), rep(2,4)), X = rep(1:4, 2), G = sample(1:10, 8, replace = T) )

I want to make a new variable that is the mean of G within each group, conditional on X being 1 or 2. In the example df, then, the new variable would have the following values:

df$newvar <- c(rep(4.5, 4), rep(8, 4))

Is there a way to do this without resorting the dataframe and then filling down? That seems really cumbersome. Thanks!

0

1 Answer 1

4

After groupig by 'group', filter the 'G' elements based on the logical condition on 'X' and get the mean of those values to create a new column with mutate

library(dplyr)
df %>%
    group_by(group) %>% 
    mutate(newvar = mean(G[X %in% 1:2]))
# A tibble: 8 x 4
# Groups:   group [2]
#  group     X     G newvar
#  <dbl> <int> <int>  <dbl>
#1     1     1     2    4.5
#2     1     2     7    4.5
#3     1     3     7    4.5
#4     1     4     7    4.5
#5     2     1     9    8  
#6     2     2     7    8  
#7     2     3     1    8  
#8     2     4     3    8  

Or using ave from base R

df$newvar <- with(df, ave(G * NA^(!X %in% 1:2), group, 
                FUN = function(x) mean(x, na.rm = TRUE)))
Sign up to request clarification or add additional context in comments.

3 Comments

Wow! That's a really nice trick, NA^(!X %in% 1:2). Didn't know that one.
@RuiBarradas With ave, we need the arguments to have the same length, so have to resort to something unusual
Also ave(replace(df$G, !df$X %in% 1:2, NA), df$group, FUN = function(x) mean(x, na.rm = TRUE))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.