Filtering data.table based on values in group of variables

Question

I ran into a problem with finding the best syntax to filter out cases where I want to filter all rows where a group of variables has at least one value non null.

An example is below:

> dat <- data.table(a=1:5, b=c(1:3, NA, NA), c=c(NA, 1:3, NA))

> cols <- c('b', 'c')

> dat[!all(is.na(cols)), .SD, with=FALSE]
Null data.table (0 rows and 0 cols)

> dat[!is.na(b)|!is.na(c), .SD]
   a  b  c
1: 1  1 NA
2: 2  2  1
3: 3  3  2
4: 4 NA  3

As you can see if I explicitly say each variable name as !is.na(variable1) | !is.na(variable2) it works however I can't find a way to include a group of variables so I can do it in 1 condition only and not concatenate everything with or.

talat · Accepted Answer · 2017-11-29 11:13:28Z

3

You can use the following syntax with rowSums and .SD:

dat[dat[, rowSums(!is.na(.SD)) > 0, .SDcols  = cols]]
#   a  b  c
#1: 1  1 NA
#2: 2  2  1
#3: 3  3  2
#4: 4 NA  3

The inner part creates a logical value that looks like this:

dat[, rowSums(!is.na(.SD)) > 0, .SDcols  = cols]
# [1]  TRUE  TRUE  TRUE  TRUE FALSE

Re the comment by Michael, you can also use Reduce + lapply:

dat[dat[, Reduce("+", lapply(.SD, function(x) !is.na(x))) > 0, .SDcols = cols]]

But for most of my use cases, the rowSums approach is ok and easier to read, imo.

edited Nov 29, 2017 at 11:13

answered Nov 29, 2017 at 11:08

talat

70.5k22 gold badges130 silver badges158 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MichaelChirico Over a year ago

works, but matrix conversion is expensive on large data. there's a Reduce approach but I'm AFK to test. Something like do.call(`+`, lapply(.SD, is.na)) could also work

Santosh M. · Accepted Answer · 2017-11-29 13:00:56Z

0

You could also do this.

dat[rowSums(!is.na(dat[, cols, with=FALSE])) > 0,]
    a  b  c
#1: 1  1 NA
#2: 2  2  1
#3: 3  3  2
#4: 4 NA  3

edited Nov 29, 2017 at 13:00

answered Nov 29, 2017 at 12:42

Santosh M.

2,4541 gold badge22 silver badges31 bronze badges

1 Comment

talat Over a year ago

Well, this doesn't fulfill OP's requirements of using a vector of relevant column names ("cols").

Collectives™ on Stack Overflow

Filtering data.table based on values in group of variables

2 Answers 2

1 Comment

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related