2

I have a large dataset in which I would like to use dplyr and filter and select the data to create 12 separate dataframes.

Essentially, I am using only two columns of data from a larger dataset. The first column is "plot", where I filter by "plot" number and another condition in another 3rd column ("pos_ID"). I want to create a loop that filters by plot number (I tried plot==[i]) and the 3rd condition, and then creates a new dataframe. The loop would repeat 12 times (because plot spans from 1-12).

Here is the code that I used without a loop (based on sample data)

 p1_Germ <- data %>% #p1 stands for plot 1
   filter(plot==1, pos_ID<21) %>% 
   select(germ_bin)

Here is the code that I tried to incorporate a loop (based on sample data)

for(i in seq_along(plot)) {
   data %>%
     group_by(plot[[i]], pos_ID<21) %>%
     select(germ_bin)
 }

Here is some sample data

plot <- c(1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12)
germ_bin <- c(0,0,1,0,1,0,0,1,1,0,1,1,0,1,0,1,0,1,1,0,1,0,1,0)
pos_ID <- c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24)
dataset <- data.frame(plot, germ_bin, pos_ID)
dataset

My guess is to use a list, but I'm not familiar with loops and list and could not find a solution online. I need to create 12 dataframes because I'm trying to convert them each into a matrix after for another function. Any helpful would be much appreciated!

1 Answer 1

3

We can use group_split and map to filter based on criteria to get list of dataframes.

library(dplyr)
library(purrr)

dataset %>%
 group_split(plot) %>%
 map(. %>% filter(pos_ID < 21) %>% select(germ_bin))

#[[1]]
# A tibble: 2 x 1
#  germ_bin
#     <dbl>
#1        0
#2        0

#[[2]]
# A tibble: 2 x 1
#  germ_bin
#     <dbl>
#1        1
#2        0

#[[3]]
# A tibble: 2 x 1
#  germ_bin
#     <dbl>
#1        1
#2        0
#....

For the shared example, if you want to drop empty groups you can filter first

dataset %>%
  filter(pos_ID < 21) %>%
  group_split(plot) %>%
  map(. %>% select(germ_bin))

As far your attempt with for loop is concerned, you can correct that by doing

unique_plot <- unique(dataset$plot)
plot_list <- list(length = length(unique_plot))

for(i in seq_along(unique_plot)) {
   plot_list[[i]] <- dataset %>%
        filter(plot == unique_plot[i], pos_ID<21) %>%
        select(germ_bin)
}

Or keeping it completely in base R

lapply(split(dataset, dataset$plot), function(x) 
             subset(x, pos_ID < 21, select = germ_bin, drop = FALSE))
Sign up to request clarification or add additional context in comments.

4 Comments

Hi! Thank you for your response. I tried each method and the first two methods work. I'm curious why the for loop only keeps two observations in each of the lists... A second follow up question: How would access each of the dataframes in the list (in other words, I want to apply a matrix function to each of the 12 new dataframes)
@Cam.S I think because after filter there are only 2 items remaining in each subset ? Also to access individual lists you can assign the output of first two options to a variable say plot_list and then use plot_list[[1]], plot_list[[2]] to access them.
Amazing. How can I transform all results into different dataframes? Such as ds_1, ds_2, ds_3 ?
@Luis Name the list output. If you save the output in a variable called plot_list, you can do names(plot_list) <- paste0('ds_', seq_along(plot_list)) and then use list2env(plot_list, .GlobalEnv).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.