3

I have a huge file where I want to create a column based on other columns. My file look like this:

person = c(1,2,3,4,5,6,7,8)
father = c(0,0,1,1,4,5,5,7)
mother = c(0,0,2,3,2,2,6,6)
ped = data.frame(person,father,mother)

And I want to create a column indicating if the person is a father or mother (gender column). I got it using a for loop in a small example, but when I apply in the whole file it takes hours to finish. How can I create an apply function to solve that, please. Thanks.

for(i in 1:nrow(ped)){
  ped$test[i] = ifelse(ped[i,1] %in% ped[,2], "M", ifelse(ped[i,1] %in% ped[,3], "F", NA)) 
}
0

3 Answers 3

3

Try this:

ped <- transform(ped, gender = ifelse(person %in% father,
                                      'M',
                                      ifelse(person %in% mother, 'F', NA)
                                     ))

Instead of looping over the individual values across the rows, this uses vectorization.

Sign up to request clarification or add additional context in comments.

Comments

3

You could try

ped$gender <- c(NA, 'M', 'F')[as.numeric(factor(with(ped, 
                  1+2*person %in% father + 4*person %in% mother)))]

Or a faster option would be to assign := with data.table

library(data.table)
setDT(ped)[person %in% father, gender:='M'][person %in% mother, gender:='F']

Comments

2

Without having to specify each "father" / "mother" / etc option in code, you could do:

vars <- c("father","mother")
factor(
  do.call(pmax, Map(function(x,y) (ped$person %in% x) * y, ped[vars], seq_along(vars) )),
  labels=c(NA,"M","F")
)
#[1] M    F    F    M    M    F    M    <NA>
#Levels: <NA> M F

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.