3

I have some variables which take value between 1 and 5. I would like to code them 0 if they take the value between 1 and 3 (included) and 1 if they take the value 4 or 5.

My dataset looks like this

var1    var2        var3
1       1            NA
4       3            4
3       4            5
2       5            3

So I would like it to be like this:

var1    var2        var3
0       0            NA
1       0            1
0       1            1
0       1            0

I tried to do a function and to call it

making_binary <- function (var){
  var <- factor(var >= 4, labels = c(0, 1))
  return(var)
}


df <- lapply(df, making_binary)

But I had an error : incorrect labels : length 2 must be 1 or 1

Where did I go wrong? Thank you very much for your answers!

0

3 Answers 3

4

You can use :

df[] <- +(df == 4 | df == 5)
df
#  var1 var2 var3
#1    0    0   NA
#2    1    0    1
#3    0    1    1
#4    0    1    0

Comparison of df == 4 | df == 5 returns logical values (TRUE/FALSE), + here turns those logical values to integer values (1/0) respectively.

If you want to apply this for selected columns you can subset the columns by position or by name.

cols <- 1:3 #Position
#cols <- grep('var', names(df)) #Name
df[cols] <- +(df[cols] == 4 | df[cols] == 5)

As far as your function is concerned you can do :

making_binary <- function (var){
  var <- as.integer(var >= 4)
  #which is faster version of
  #var <- ifelse(var >= 4, 1, 0)
  return(var)
}

df[] <- lapply(df, making_binary)

data

df <- structure(list(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 
5L), var3 = c(NA, 4L, 5L, 3L)), class = "data.frame", row.names = c(NA, -4L))
Sign up to request clarification or add additional context in comments.

5 Comments

I cannot really do that because I have lots of other variables which I do not want to change
Interesting. Please could you explain what does this leading + mean ?
@Emeline if you only want to change the first and second column change df[] to df[, c(1:2)]
Thank you for answering many of my questions andd always make it simple for a beginner to understand! I am really improving thanks to you (and others from Stackoverflow!)
@Emeline There are ways in which you can apply the function to selected columns. See edit to the answer that shows couple of them.
1

I think ifelse would fit the problem well:

df[] <- lapply(df, function(x) ifelse(x >=1 & x <=3, 0, x))
df
  var1 var2 var3
1    0    0   NA
2    4    0    4
3    0    4    5
4    0    5    0
df[] <- lapply(df, function(x) ifelse(x >=4 & x <=5, 1, x))

df
  var1 var2 var3
1    0    0   NA
2    1    0    1
3    0    1    1
4    0    1    0

If you need to do the two steps at once, you can look at dplyr::case_when() or data.table::fcase().

1 Comment

Thank you! This is a nice easy way to do it!
1

You can simply test if the value is larger than 3, which will return TRUE and FALSE and cast this to a number:

+(x>3)
#     var1 var2 var3
#[1,]    0    0   NA
#[2,]    1    0    1
#[3,]    0    1    1
#[4,]    0    1    0

In case you want this only for some columns, you have to subset them. E.g. for column 1 and 2:

+(x[1:2]>3)
#+(x[c("var1","var2")]>3)  #Alternative
#     var1 var2
#[1,]    0    0
#[2,]    1    0
#[3,]    0    1
#[4,]    0    1

Data:

x <- data.frame(var1 = c(1L, 4L, 3L, 2L), var2 = c(1L, 3L, 4L, 5L)
              , var3 = c(NA, 4L, 5L, 3L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.