Filter Data Frame by Matching Multiple String in Multiple Columns

Question

I have been unsuccessfully trying to filter my data frame using the dplyr and grep libraries using a list of string across multiple columns of my data frame. I would assume this is a simple task, but either nobody has asked my specific question or it's not as easy as I thought it would originally be.

For the following data frame...

foo <- data.frame(var.1 = c('a', 'b',' c'),
           var.2 = c('b', 'd', 'e'),
           var.3 = c('c', 'f', 'g'),
           var.4 = c('z', 'a', 'b'))

... I would like to be able to filter row wise to find rows that contain all three variables a, b, and c in them. My sought after answer would only return row 1, as it contains a, b, and c, and not return rows 2 and 3 even though they contain two of the three sought after variables, they do not contain all three in the same row.

I'm running into issues where grep only allows specifying vectors or one column at a time when I really just care about finding string across many columns in the same row.

I've also used dplyr to filter using %in%, but it just returns when any of the variables are present:

foo %>% 
  filter(var.1 %in% c('a', 'b', 'c') |
           var.2 %in% c('a', 'b', 'c') |
           var.3 %in% c('a', 'b', 'c'))

Thanks for any and all help and please, let me know if you need any clarification!

foo[apply(foo, 1, function(x) all(c('a', 'b', 'c') %in% x)), ] — Ronak Shah
– Ronak Shah, Commented Jul 10, 2017 at 1:31
apply( foo, 2, function(x) all( grepl(x = x, pattern = "[abc]" ) )) — Sathish
– Sathish, Commented Jul 10, 2017 at 1:36
apply( foo, 2, function(x) sum( grepl(x = x, pattern = "[abc]" ) ) == 3) — Sathish
– Sathish, Commented Jul 10, 2017 at 1:38

d.b · Accepted Answer · 2017-07-10 22:29:19Z

4

Here's an approach in base R where we check if the elements of foo are equal to "a", "b", or "c" successively, add the Booleans and check if the sum of those Booleans for each row is greater than or equal to 3

Reduce("+", lapply(c("a", "b", "c"), function(x) rowSums(foo == x) > 0)) >=3
#[1]  TRUE FALSE FALSE

Timings

foo = matrix(sample(letters[1:26], 1e7, replace = TRUE), ncol = 5)
system.time(Reduce("+", lapply(letters[1:20], function(x) rowSums(foo == x) > 0)) >=20)
#   user  system elapsed 
#   3.26    0.48    3.79 

system.time(apply(foo, 1, function(x) all(letters[1:20] %in% x)))
#   user  system elapsed 
#  18.86    0.00   19.19 


identical(Reduce("+", lapply(letters[1:20], function(x) rowSums(foo == x) > 0)) >=20, 
          apply(foo, 1, function(x) all(letters[1:20] %in% x)))
#[1] TRUE
>

edited Jul 10, 2017 at 22:29

answered Jul 10, 2017 at 1:36

d.b

32.6k6 gold badges46 silver badges90 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Spacedman · Accepted Answer · 2017-07-10 07:09:13Z

2

Your problem arises from trying to apply "tidyverse" solutions to data that isn't tidy. Here's the tidy solution, which uses melt to make your data tidy. See how much tidier this solution is?

> library(reshape2)
> rows = foo %>%
      mutate(id=1:nrow(foo)) %>% 
      melt(id="id") %>% 
      filter(value=="a" | value=="b" | value=="c") %>%
      group_by(id) %>% 
      summarize(N=n()) %>% 
      filter(N==3) %>%
      select(id) %>%
      unlist
Warning message:
attributes are not identical across measure variables; they will be dropped

That gives you a vector of matching row indexes, which you can then subset your original data frame with:

> foo[rows,]
  var.1 var.2 var.3 var.4
1     a     b     c     z
>

answered Jul 10, 2017 at 7:09

Spacedman

94.7k12 gold badges148 silver badges231 bronze badges

1 Comment

thelatemail Over a year ago

value=="a" | value=="b" | value=="c" could be value %in% c("a","b","c") surely.

Collectives™ on Stack Overflow

Filter Data Frame by Matching Multiple String in Multiple Columns

2 Answers 2

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related