Using loop to repeat the same function for different datasets

Question

I used the list to create 4 datasets. Now I want to list all potential ID variables in each dataset. My criteria are: 1)if this variable has over 80% unique observations; 2) If this variable does not have missing value over 30%.

To get those statistic variables, I first use skimr function in R to get a tibble containing all information, then I used filter to sift out the variables I am looking for based on the two criteria aforementioned. Here is my code:

 dfa<- dflist[[1]]%>%
      mutate_if(is.numeric,as.character)%>%
      skim()%>%
      as_tibble()%>%
      filter(character.n_unique >=nrow(dflist[[1]])*0.01)%>%
      filter(n_missing<=nrow(dflist[[1]])*0.30)

This code works fine and returns the expected variables for dataset 1. However, I have 4 different size datasets, so I am considering to integrate it into a loop code. Here is my try: First, I create a dfid list to contain the new results since I do not want the dflist is modified. Then I changed 1 in previous code in dflist[[1]] to "i". But this code does not work, the R warns that "Error in filter(., dflist[[i]][, character.n_unique] >= nrow(dflist[[1]]) * : Caused by error in [.data.frame: ! undefined columns selected".

Here is my code:

dfid<-list()
for (i in 1:4){
    dfid[[i]]<-dflist[[i]]%>%
            mutate_if(is.numeric,as.character)%>%
            skim()%>%
            as_tibble()%>%
            filter(dflist[[i]][,character.n_unique] >=nrow(dflist[[i]])*0.01)%>%
            filter(dflist[[i]][,n_missing]<=nrow(dflist[[i]])*0.30)
}

So my questions are:

How to fix this error to make the goal possible?
Once the dfid[[i]] has desired variables from 4 different datasets, what code I should add in to loop to combine them (4 lists) together and distinct the variable name, finally get the vector of variable names from this combined list or dataset?

Thanks a lot for your help in advance~~!

akrun · Accepted Answer · 2022-11-29 17:25:39Z

1

The columns should be quoted if we are using [ unless it is an object. It may be easier to loop with map/lapply

library(purrr)
library(dplyr)
dfid <- map(dflist, ~ .x %>% 
      mutate(across(where(is.numeric), as.character))%>%
      skim()%>%
      as_tibble()%>%
      filter(character.n_unique >= n()*0.01)%>%
      filter(n_missing <= n()*0.30))

We don't need the [ when we use the chain

dfid <- vector('list', length(dflist))
for (i in seq_along(dflist)){
    tmp <- dflist[[i]]
      dfid[[i]] <-  tmp %>%
            mutate_if(is.numeric,as.character)%>%
            skim()%>%
            as_tibble()%>%
            filter(character.n_unique >=n()*0.01)%>%
            filter(n_missing <=n()*0.30)
}

edited Nov 29, 2022 at 17:25

answered Nov 29, 2022 at 17:19

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

Rstudyer Over a year ago

Thanks a lot for your answer~~~Now I got the dfid[[]] list, I use this namelist<-unique(c(dfid[[1]]$skim_variable,dfid[[2]]$skim_variable,dfid[[3]]$skim_variable,dfid[[4]]$skim_variable)) to create the unique id name list, could you teach me how to simplify this code? Like now I have to paste the dfid[[1]], dfid[[2]]... Thanks so much~~!

akrun Over a year ago

@Rstudyer if you want to extract the column use unique(sapply(dfid, "[[", "skim_variable)) should do it

Rstudyer Over a year ago

thanks so much~~! Just one more question following that sapply code, when i use that, it returns different lists with unique variable names in each list instead of the one vector containing all unique variable names. Is there a way to use command to combine all 4 dfid lists and then unique it as a whole vector? Thanks a lot~~

akrun Over a year ago

@Rstudyer you may do unique(unlist(sapply(... Probably because of hte length difference, it still returned a list

Rstudyer Over a year ago

it works~~~!!! Thanks so much~~!

Collectives™ on Stack Overflow

Using loop to repeat the same function for different datasets

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related