Looping dplyr and creating multiple dataframe

Question

I have a large dataset in which I would like to use dplyr and filter and select the data to create 12 separate dataframes.

Essentially, I am using only two columns of data from a larger dataset. The first column is "plot", where I filter by "plot" number and another condition in another 3rd column ("pos_ID"). I want to create a loop that filters by plot number (I tried plot==[i]) and the 3rd condition, and then creates a new dataframe. The loop would repeat 12 times (because plot spans from 1-12).

Here is the code that I used without a loop (based on sample data)

 p1_Germ <- data %>% #p1 stands for plot 1
   filter(plot==1, pos_ID<21) %>% 
   select(germ_bin)

Here is the code that I tried to incorporate a loop (based on sample data)

for(i in seq_along(plot)) {
   data %>%
     group_by(plot[[i]], pos_ID<21) %>%
     select(germ_bin)
 }

Here is some sample data

plot <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12)
germ_bin <- c(0,0,1,0,1,0,0,1,1,0,1,1,0,1,0,1,0,1,1,0,1,0,1,0)
pos_ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
dataset <- data.frame(plot, germ_bin, pos_ID)
dataset

My guess is to use a list, but I'm not familiar with loops and list and could not find a solution online. I need to create 12 dataframes because I'm trying to convert them each into a matrix after for another function. Any helpful would be much appreciated!

Ronak Shah · Accepted Answer · 2019-05-29 03:05:38Z

3

We can use group_split and map to filter based on criteria to get list of dataframes.

library(dplyr)
library(purrr)

dataset %>%
 group_split(plot) %>%
 map(. %>% filter(pos_ID < 21) %>% select(germ_bin))

#[[1]]
# A tibble: 2 x 1
#  germ_bin
#     <dbl>
#1        0
#2        0

#[[2]]
# A tibble: 2 x 1
#  germ_bin
#     <dbl>
#1        1
#2        0

#[[3]]
# A tibble: 2 x 1
#  germ_bin
#     <dbl>
#1        1
#2        0
#....

For the shared example, if you want to drop empty groups you can filter first

dataset %>%
  filter(pos_ID < 21) %>%
  group_split(plot) %>%
  map(. %>% select(germ_bin))

As far your attempt with for loop is concerned, you can correct that by doing

unique_plot <- unique(dataset$plot)
plot_list <- list(length = length(unique_plot))

for(i in seq_along(unique_plot)) {
   plot_list[[i]] <- dataset %>%
        filter(plot == unique_plot[i], pos_ID<21) %>%
        select(germ_bin)
}

Or keeping it completely in base R

lapply(split(dataset, dataset$plot), function(x) 
             subset(x, pos_ID < 21, select = germ_bin, drop = FALSE))

edited May 29, 2019 at 3:05

answered May 29, 2019 at 2:14

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Cameron So Over a year ago

Hi! Thank you for your response. I tried each method and the first two methods work. I'm curious why the for loop only keeps two observations in each of the lists... A second follow up question: How would access each of the dataframes in the list (in other words, I want to apply a matrix function to each of the 12 new dataframes)

Ronak Shah Over a year ago

@Cam.S I think because after filter there are only 2 items remaining in each subset ? Also to access individual lists you can assign the output of first two options to a variable say plot_list and then use plot_list[[1]], plot_list[[2]] to access them.

Luis Over a year ago

Amazing. How can I transform all results into different dataframes? Such as ds_1, ds_2, ds_3 ?

Ronak Shah Over a year ago

@Luis Name the list output. If you save the output in a variable called plot_list, you can do names(plot_list) <- paste0('ds_', seq_along(plot_list)) and then use list2env(plot_list, .GlobalEnv).

Collectives™ on Stack Overflow

Looping dplyr and creating multiple dataframe

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related