1

I have mutliple data frames (different size of rows, however, same columns). I want to create a factor in each of them using a loop.

list.dfs <- list(d1, d2, d3, d4, d5, d6, d7, d8)

for (i in 1:length(list.dfs)){
  d[i]$Gender <- factor(d[i]$Gender,
  levels = c(1, 2, 3),
  labels = c("female", "male", "divers")
)
}

This is not working

3
  • 2
    It should be list.dfs[[i]]$Gender. Commented May 24, 2022 at 11:12
  • Yeah, that is working now, however, the variables are not saved in d1 through d8. Do you maybe have an idea? Commented May 24, 2022 at 11:30
  • Can you post sample data? Please edit the question with the output of dput(d1). Or, if it is too big with the output of dput(head(d1, 20)). And say in the question whether all d1 through d8 have the same structure. Commented May 24, 2022 at 16:21

2 Answers 2

1

Call your factor instruction in a lapply loop and assign the result back to list.dfs.

list.dfs <- list(structure(list(Gender = c(1, 2, 3)), 
                           class = "data.frame", row.names = c(NA, -3L)), 
                 structure(list(Gender = c(1, 2, 3)), 
                           class = "data.frame", row.names = c(NA, -3L)))

list.dfs <- lapply(list.dfs, \(x) {
  x$Gender <- factor(x$Gender, levels = c(1, 2, 3), labels = c("female", "male", "divers"))
  x
})

list.dfs
#> [[1]]
#>   Gender
#> 1 female
#> 2   male
#> 3 divers
#> 
#> [[2]]
#>   Gender
#> 1 female
#> 2   male
#> 3 divers

Created on 2022-05-24 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks a lot for your help. Maybe I was wrong about using the list in the beginning I read this is in another post. See my script below. Frames from 1 to 7 have the same columns, however, different amount of rows. I was just thinking of using a more efficient to run function over multiple data frames d1$Sex <- factor(d1$Sex, levels = c(1, 2, 3), labels = c("female", "male", "inter") ) d2$Sex <- factor(d2$Sex, levels = c(1, 2, 3), labels = c("female", "male", "inter") ) .... d7$Sex <- factor(d7$Sex, levels = c(1, 2, 3), labels = c("female", "male", "inter") )
@Joshua No, you were not wrong in putting the df's in a list, that's where they should be. 1) Don't clutter the globalenv; 2) lists are easy to process; 3) apply the same sequence of instructions to all list members. That's the R way.
Perfect, thanks a lot. Appreciate all your help :-)
1

R would not read d[i] as the object d1 if i = 1. You can access items of a list using [[i]]. Note that class(list.dfs[1]) is list while class(list.dfs[[1]]) is data.frame.

As an example

#example data
list.dfs <- list(structure(list(gender = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L)), structure(list(gender = c(1, 2, 3)), class = "data.frame", row.names = c(NA, 
-3L)))

#check first item
list.dfs[[1]]

  gender
1      1
2      2
3      3

#use for loop to access all items of the list, apply function
for(i in 1:length(list.dfs)){
  list.dfs[[i]]$gender <- factor(list.dfs[[i]]$gender, levels = c(1, 2, 3),
                             labels = c("female", "male", "diverse"))

}

You also might want to read into lapply which applies a function on every object in the list.

#example using lapply
lapply(list.dfs, FUN = function(x) x$gender <- factor(x$gender, levels = c(1,2,3), labels = c("female", "male", "diverse")))

Regarding your comment (assign objects in the list to global environment).

#say we used that lapply function before and end up with this list
list.dfs <- list(structure(1:3, .Label = c("female", "male", "diverse"), class = "factor"), 
    structure(1:3, .Label = c("female", "male", "diverse"), class = "factor"))

Then we can add names to the list according to the order the items were put in the list and assign list objects to global environment.

#add names (will be the object names later)
names(list.dfs) <- c("d1", "d2")

#assign to global environment
for(i in 1:length(list.dfs)){
  assign(names(list.dfs)[i], list.dfs[[i]])
}

Note, I am not a fan of assigning objects like that. Personally I would probably write a function and use that function 8 times. E.g.

#create function
gender_fun <- function(x){
  #read data.frame
  dt1 <- x
  
  #set gender
  dt1$gender <- factor(dt1$gender, levels = c(1,2,3), labels = c("female", "male", "diverse"))

  #return
  return(dt1)
}

#apply function on the data.frames
dt1 <- gender_fun(dt1)
dt2 <- gender_fun(dt2)
#etc...

4 Comments

Perfect, thanks for the advice with lapply. This is working. How can I save the new factor in all of the dataframes? So I want to overwrite the numeric gender in all data frames with the factor gender.
You can use dt1 <- list.dfs[[1]], dt2 <- list.dfs[[2]] etc.. But going back to the individual data.frame objects would defeat the purpose of making the list?
Oh so I actually do need the list? So how would the command look like for the lapply function? I just wanted to change Gender in all 8 dataframes to a factor without having to write the same command 8 times. Sorry for the confusion. Do you have an idea?
Adjusted answer accordingly. You can use assign but can consider writing a function and use that function for each data.frame individually. Or keep working with the list so you can use lapply like functions. I personally avoid using assign.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.