1

I have a dataframe where I would like to suppress certain values when they are bases on a limited number of observarions.

My dataset looks something like this:

> GROUP <- c("A", "B", "C", "D", "E", "F")
> AVERAGE <- c(100, 5, 10, 10, 5, 5)
> N_AVERAGE <- c(53, 5, 12, 20, 50, 2)
> df_average <- data.frame(GROUP , AVERAGE, N_AVERAGE)
> df_average
  GROUP AVERAGE N_AVERAGE
1     A     100        53
2     B       5         5
3     C      10        12
4     D      10        20
5     E       5        50
6     F       5         2

I would like to create a new variable, AVERAGE_new, which takes the value of "AVERAGE" when "N_AVERAGE" is >= 10. When "N_AVERAGE" is < 10 I would like the new variabele to be NA.

This was my first attempt:

funct_suppress <- function(dataset #input dataset
                           , var_goal # variable to suppress based on other variable
                           , var_N # variable used to determine whether to suppress
                           , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
{
  dataset <- dataset %>% 
    mutate(paste0(var_goal,"_new") = ifelse((var_N < lower_bound),NA, var_goal))
}
df_average <- funct_suppress(df_average, AVERAGE, AVERAGE_nw,N_AVERAGE,10) # suppress all AVERAGE when N_AVERAGE  < 10

Obsiously, this does not work. I understand that R will not be able to interpret that var_goal / var_N are variables. So I tried the following:

> funct_suppress <- function(dataset #input dataset
+                            , var_goal # variable to suppress based on other variable
+                            , var_goal_nw # suppresses value of var_goal
+                            , var_N # variable used to determine whether to suppress
+                            , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
+ {
+   
+   var_goal= enquo(var_goal) 
+   var_goal_nw= enquo(var_goal_nw) 
+   var_N = enquo(var_N)
+   
+   dataset <- dataset %>% 
+     mutate(var_goal = !!var_goal,
+            var_goal_nw = var_goal,
+            var_N = !!var_N,) %>% 
+     mutate(var_goal_nw = ifelse((var_N < lower_bound),NA, var_goal)) %>% 
+     select(-var_goal, -var_N)
+ }
> df_average <- funct_suppress(df_average, AVERAGE, AVERAGE_nw, N_AVERAGE,10) # suppress all AVERAGE when N_AVERAGE  < 10
> df_average
  GROUP AVERAGE N_AVERAGE var_goal_nw
1     A     100        53         100
2     B       5         5          NA
3     C      10        12          10
4     D      10        20          10
5     E       5        50           5
6     F       5         2          NA

This does work, but my new variable does not have the name that I want it to have.

How would I do this? If a function is not the most efficient way to go about this I'm open to other suggestions. However, the input variables do need to be able to change, since I need to perform this task on a number of dataframes with differing variable names.

Thank you!

2 Answers 2

2

you can copy all the values then remove the ones < 10 after

df_average$AVERAGE_new <- df_average$AVERAGE
df_average$AVERAGE_new[df_average$N_AVERAGE < 10] <- NA


 df_average
  GROUP AVERAGE N_AVERAGE AVERAGE_new
1     A     100        53         100
2     B       5         5          NA
3     C      10        12          10
4     D      10        20          10
5     E       5        50           5
6     F       5         2          NA
Sign up to request clarification or add additional context in comments.

3 Comments

Thank you, but I would like to have an function or something similar to make this change. The example I have included here is very simple, but in reality I need to make a number of adjustments to my data and I need to do this for a large number of datasets / variables. So I do not want to make the changes by hand each time.
If the only problem with your code is that your new column does not have the correct name, you can add something like this at the end of your function: colnames(dataset)[ colnames(dataset) == "var_goal_nw"] <- paste0(var_goal_nw, "_new")
That would be the easiest way, but that does not work. I get the error that var_goal_nw does not exist, and that the "number of items to replace is not a multiple of replacement length". I've also tried using rename_ (from stackoverflow.com/questions/35023375/…), but it also does not work..
0

You could modify your function in such a way, if your dplyr version is at least 0.7:

funct_suppress <- function(dataset #input dataset
                         , var_goal # variable to suppress based on other variable
                         , var_goal_nw # suppresses value of var_goal
                         , var_N # variable used to determine whether to suppress
                         , lower_bound) # lower_bound for var_N, when value is below lower_bound, suppress var_goal
  {

         var_goal <-  enquo(var_goal) 
         var_goal_nw <-  enquo(var_goal_nw)
         var_N = enquo(var_N)
         varname <- quo_name(var_goal_nw)

           dataset %>% 
               mutate(!!varname := ifelse((!!var_N < lower_bound),NA, !!var_goal))
}

The important parts are varname <- quo_name(var_goal_nw) and !!varname :=. The other differences compared to your original function are just some minor changes to be more concise.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.