3

I don't think this exact question has been asked - lots of stuff on subsetting based on one value (i.e., x[grepl("some string", x[["column1"]]),]), but not multiple values/strings.

Here is an example of my data:

#create sample data frame
data = data.frame(id = c(1,2,3,4), phrase = c("dog, frog, cat, moose", "horse, bunny, mouse", "armadillo, cat, bird,", "monkey, chimp, cow"))

#convert the `phrase` column to character string (the dataset I'm working on requires this)
data$phrase = data$phrase

#list of strings to remove rows by
remove_if = c("dog", "cat")

This will give a dataset that looks like:

  id                phrase
1  1 dog, frog, cat, moose
2  2   horse, bunny, mouse
3  3 armadillo, cat, bird,
4  4    monkey, chimp, cow

I want to remove row 1 and row 3 (because row 1 contains "dog" and row 3 contains "cat"), but keep row 2 and row 4.

  id                phrase
1  2   horse, bunny, mouse
2  4    monkey, chimp, cow

In other words, I want to subset data so that it is only (the headers and) row 2 and row 4 (because they contain neither "dog" nor "cat").

Thanks!

5 Answers 5

2

In case you want to mix it with dplyr and stringr:

library(stringr)
library(dplyr)

data %>%
  filter(str_detect(phrase, paste(remove_if, collapse = "|"), negate = TRUE))
#   id              phrase
# 1  2 horse, bunny, mouse
# 2  4  monkey, chimp, cow
Sign up to request clarification or add additional context in comments.

Comments

1

We can use grepl with subset after pasteing the 'remove_if' to a single string

subset(data, !grepl(paste(remove_if, collapse="|"), phrase))
#    id              phrase
#2  2 horse, bunny, mouse
#4  4  monkey, chimp, cow

1 Comment

Worked as expected on the real dataset - thank you!
1

Use grep

> data[grep(paste0(remove_if, collapse = "|"), data$phrase, invert = TRUE), ]
  id              phrase
2  2 horse, bunny, mouse
4  4  monkey, chimp, cow

Comments

1
data[!grepl(paste0("(^|, )(", paste0(remove_if, collapse = "|"), ")(,|$)"), data$phrase),]

# id                    phrase
#  2 caterpillar, bunny, mouse
#  4        monkey, chimp, cow

the regex constructed in this example is "(^|, )(dog|cat)(,|$)", to avoid matching words that contain 'cat' or 'dog', but aren't actually the exact words, e.g. 'caterpillar'

Comments

0

An another way (maybe not the best one):

data[-unique(unlist(sapply(c(remove_if),function(x){grep(x,data$phrase)}))),]
  id              phrase
2  2 horse, bunny, mouse
4  4  monkey, chimp, cow

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.