3

I have the following data frame.

Data_Frame <- data.frame(Factor_1 = rep(LETTERS[1:4], each = 12, length.out = 48), Factor_2 = rep(letters[1:3], each = 4, length.out = 48), Factor_3 = rep(1:2, each = 2, length.out = 48), Response = rnorm(48, 25, 1))

I want to create a nested list where I've split the data frame by each of the factors in the study in succession. I'll start with a vector containing the column names which contain the factors I want to split the data frame by (this vector will contain the factors in the order I want the resulting list to be nested in).

Factors_to_Split_by <- c("Factor_1", "Factor_2", "Factor_3")

The resulting list should look like the following Output object.

Output <- lapply(lapply(split(Data_Frame, Data_Frame[, which(colnames(Data_Frame) == Factors_to_Split_by[1])]), function (x) {
  split(x, x[, which(colnames(x) == Factors_to_Split_by[2])])
}), function (x) {
  lapply(x, function (y) {
    split(y, y[, which(colnames(y) == Factors_to_Split_by[3])])
  })
})

How can I write a recursive function using Factors_to_Split_by as the input and returning the desired Output list as the output? I may have more than 3 factors to split the data by, and I'd like something modular and efficient and programmatic.

Thanks!

2
  • What is the nested list supposed to look like? Can you include the desired output in your question? Commented Aug 20, 2024 at 21:36
  • @IfeanyiIdiaye - the code which generates the desired Output object is already included by the author. Commented Aug 20, 2024 at 21:46

2 Answers 2

5

Here is one possible approach using Reduce and a custom function:

split_df <- function(x, split) {
  if (is.data.frame(x)) {
    split(x, x[split])
  } else {
    lapply(x, split_df, split = split)  
  }
}

Output2 <- Reduce(split_df, Factors_to_Split_by, init = Data_Frame)

identical(Output, Output2)
#> [1] TRUE
Sign up to request clarification or add additional context in comments.

1 Comment

Beautiful code with Reduce, cheers!
3

You can define a recursive function like this

f <- function(data, fct) {
  if (length(fct) == 1) split(data, data[fct])
  lapply(split(data, data[fct[1]]), f, fct = fct[-1])
}

such that

> f(Data_Frame, Factors_to_Split_by)
$A
$A$a
$A$a$`1`
  Factor_1 Factor_2 Factor_3 Response
1        A        a        1 25.91996
2        A        a        1 25.12079

$A$a$`2`
  Factor_1 Factor_2 Factor_3 Response
3        A        a        2 24.88218
4        A        a        2 24.77660


$A$b
$A$b$`1`
  Factor_1 Factor_2 Factor_3 Response
5        A        b        1 25.63426
6        A        b        1 24.64074

$A$b$`2`
  Factor_1 Factor_2 Factor_3 Response
7        A        b        2 26.60224
8        A        b        2 25.17982


$A$c
$A$c$`1`
   Factor_1 Factor_2 Factor_3 Response
9         A        c        1 24.90249
10        A        c        1 26.12602

$A$c$`2`
   Factor_1 Factor_2 Factor_3 Response
11        A        c        2 25.87801
12        A        c        2 24.82886



$B
$B$a
$B$a$`1`
   Factor_1 Factor_2 Factor_3 Response
13        B        a        1 25.29955
14        B        a        1 24.74579

$B$a$`2`
   Factor_1 Factor_2 Factor_3 Response
15        B        a        2 25.06018
16        B        a        2 27.33450


$B$b
$B$b$`1`
   Factor_1 Factor_2 Factor_3 Response
17        B        b        1 25.78050
18        B        b        1 24.96464

$B$b$`2`
   Factor_1 Factor_2 Factor_3 Response
19        B        b        2 24.04945
20        B        b        2 23.52038


$B$c
$B$c$`1`
   Factor_1 Factor_2 Factor_3 Response
21        B        c        1 25.68414
22        B        c        1 25.25209

$B$c$`2`
   Factor_1 Factor_2 Factor_3 Response
23        B        c        2 24.32218
24        B        c        2 25.81953



$C
$C$a
$C$a$`1`
   Factor_1 Factor_2 Factor_3 Response
25        C        a        1 23.61297
26        C        a        1 25.52444

$C$a$`2`
   Factor_1 Factor_2 Factor_3 Response
27        C        a        2 27.80018
28        C        a        2 24.85324


$C$b
$C$b$`1`
   Factor_1 Factor_2 Factor_3 Response
29        C        b        1 24.63975
30        C        b        1 23.95888

$C$b$`2`
   Factor_1 Factor_2 Factor_3 Response
31        C        b        2 24.93261
32        C        b        2 23.85798


$C$c
$C$c$`1`
   Factor_1 Factor_2 Factor_3 Response
33        C        c        1 25.29823
34        C        c        1 25.16727

$C$c$`2`
   Factor_1 Factor_2 Factor_3 Response
35        C        c        2 25.36553
36        C        c        2 24.99169



$D
$D$a
$D$a$`1`
   Factor_1 Factor_2 Factor_3 Response
37        D        a        1 24.53971
38        D        a        1 24.72733

$D$a$`2`
   Factor_1 Factor_2 Factor_3 Response
39        D        a        2 25.74960
40        D        a        2 24.56601


$D$b
$D$b$`1`
   Factor_1 Factor_2 Factor_3 Response
41        D        b        1 22.72847
42        D        b        1 24.29836

$D$b$`2`
   Factor_1 Factor_2 Factor_3 Response
43        D        b        2 24.69552
44        D        b        2 23.77094


$D$c
$D$c$`1`
   Factor_1 Factor_2 Factor_3 Response
45        D        c        1 24.07517
46        D        c        1 26.21868

$D$c$`2`
   Factor_1 Factor_2 Factor_3 Response
47        D        c        2 26.46018
48        D        c        2 24.44250

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.