1

I have a large dataset with the two first columns that serve as ID (one is an ID and the other one is a year variable). I would like to compute a count by group and to loop over each variable that is not an ID one. This code below shows what I want to achieve for one variable:

library(tidyverse)

df <- tibble(
  ID1 = c(rep("a", 10), rep("b", 10)),
  year = c(2001:2020),
  var1 = rnorm(20),
  var2 = rnorm(20))

df %>%
  select(ID1, year, var1) %>%
  filter(if_any(starts_with("var"), ~!is.na(.))) %>%
  group_by(year) %>%
  count() %>%
  print(n = Inf)

I cannot use a loop that starts with for(i in names(df)) since I want to keep the variables "ID1" and "year". How can I run this piece of code for all the columns that start with "var"? I tried using quosures but it did not work as I receive the error select() doesn't handle lists. I also tried to work with select(starts_with("var") but with no success. Many thanks!

0

2 Answers 2

1

Another possible solution:

library(tidyverse)

df %>% 
  group_by(ID1) %>% 
  summarise(across(starts_with("var"), ~ length(na.omit(.x))))

#> # A tibble: 2 × 3
#>   ID1    var1  var2
#>   <chr> <int> <int>
#> 1 a        10    10
#> 2 b        10    10
Sign up to request clarification or add additional context in comments.

3 Comments

This is exactly what I need! Do you know how to exclude the NA observations from the count? I tried to add na.rm = TRUE but I get an error
You could remove them before, e.g. filter(if_any(everything(), ~ !is.na(.)))
You can use na.omit, @Djoustaine -- this will ignore NA. Please, see my updated solution.
1
for(i in names(df)[grepl('var',names(df))])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.