5

I am trying to create a variable that is a logical value when comparing one character string to more than two other character strings in a data.table and I need to ignore NA's.

Sample data for D2:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1")), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x0000015eb1261ef0>)

Attempted a proposed solution below:

D2[, Match := apply(sapply(.SD, `==`, D2[, "var1"]), 1, any), .SDcols = 
c("var2", "var3")]

Result for a003 is TRUE whereas it should be FALSE because var1 and var3 don't match:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1"), Match = c(TRUE, TRUE, TRUE)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x0000015eb1261ef0>)

Desired Result:

structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
"char1", "char1"), Match = c(TRUE, TRUE, FALSE)), row.names = c(NA, 
-3L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 
0x0000015eb1261ef0>)
4
  • Your desired result is identical to your sample data. Did you forget to include the proper expected output? Commented Sep 17, 2019 at 22:05
  • @Maurits Evers Never fails. I just edited the desired result. Commented Sep 17, 2019 at 22:26
  • Thanks for updating. Do you only want to compare var1 with var2? What about var3? I'm still trying to understand the rules/logic behind your expected output. For row 3, is Match = FALSE because var1 != var2? Why is Match = TRUE for row 2? Commented Sep 17, 2019 at 22:55
  • @MauritsEvers Yes exactly, I have more than two columns that I need to match with var1. I can not find one example of using mapply with more than two variables. ID a003 is not a match because var1 does not match one of either var2 or var3. Commented Sep 17, 2019 at 23:08

4 Answers 4

4

How about the following

setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, any), .SDcols = c("var2", "var3")]
D1
#ID  var1  var2  var3 Match
#1: a001 char1 char1 char1  TRUE
#2: a002 char1  <NA> char1  TRUE
#3: a003 char2 char1 char1 FALSE

Explanation: We compare entries in the sub-data.table defined through .SDcols with entries in D1[, "var1"]; if there is any match, return TRUE, else FALSE.


Update

In response to your comment, you can do

setDT(D1)
D1[, Match := apply(sapply(.SD, `==`, D1[, "var1"]), 1, all, na.rm = T), .SDcols = c("var2", "var3")]
Sign up to request clarification or add additional context in comments.

3 Comments

If I change the value of var2 for ID a003 from char1 to char2 the result is TRUE whereas it should be FALSE because var3 is not a match with var1 even though var2 is.
@user3594490 That was not clear to me; I thought you wanted to have TRUE if at least one entry matched. I've made an edit, please take a look.
Yes! This works and my initial explanation needed to be more clear.
1

Another option is:

D2[, m := Reduce(`&`, data.table(
        sweep(as.matrix(.SD[, -"var1"]), 1L, .SD[["var1"]], function(x, y) is.na(x) | x==y))), 
    .SDcols=var1:var3]

Or using melt:

D2[, m := melt(D2, id.vars=c("ID", "var1"))[, any(var1==value, na.rm=TRUE), ID]$V1]

data:

library(data.table)
D2 <- structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1", 
    "char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", 
        "char1", "char1")), row.names = c(NA, -3L), class = c("data.table", 
            "data.frame"))
setDT(D2)

Comments

1

We can do this with only apply by checking if the first value in the row is same as all the other elements in the row.

D2$match <- apply(D2[, -1], 1, function(x) all(x[1] == na.omit(x)))
D2
#     ID  var1  var2  var3 match
#1: a001 char1 char1 char1  TRUE
#2: a002 char1  <NA> char1  TRUE
#3: a003 char2 char2 char1 FALSE

1 Comment

could you please explain this logic
0

Should be as simple as defaulting NA values:

else.na <- function(x, y) ifelse(is.na(x), y, x)
dt[,Match:= mapply('%in%', var1, else.na(var2, var1)) ,by = ID]

1 Comment

This works for var1 and var2 and produces desired output given my example. How can I apply it to var3 also? For example,only if the value for var1 matches the values for var2 and var3 would the result be TRUE.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.