3

I don't think this exact question has been asked yet (for R, anyway).

I want to retain any columns in my dataset (there are hundreds in actuality) that contain a certain string, and drop the rest. I have found plenty of examples of string searching column names, but nothing for the contents of the columns themselves.

As an example, say I have this dataset:

df = data.frame(v1 = c(1, 8, 7, 'No number'),
                v2 = c(5, 3, 5, 1),
                v3 = c('Nothing', 4, 2, 9),
                v4 = c(3, 8, 'Something', 6))

For this example, say I want to retain any columns with the string No, so that the resulting dataset is:

         v1      v3
1         1 Nothing
2         8       4
3         7       2
4 No number       9

How can I do this in R? I am happy with any sort of solution (e.g., base R, dplyr, etc.)!

Thanks in advance!

4 Answers 4

4

Base R :

df[colSums(sapply(df, grepl, pattern = 'No')) > 0]

#         v1      v3
#1         1 Nothing
#2         8       4
#3         7       2
#4 No number       9

Using dplyr :

library(dplyr)
df %>% select(where(~any(grepl('No', .))))
Sign up to request clarification or add additional context in comments.

Comments

4

Simply

df[grep("No", df)]
#          v1      v3
# 1         1 Nothing
# 2         8       4
# 3         7       2
# 4 No number       9

This works, because grep internally checks if if (!is.character(x)) and if that's true it basically does:

s <- structure(as.character(df), names = names(df))
s
# v1 
# "c(\"1\", \"8\", \"7\", \"No number\")" 
# v2 
# "c(5, 3, 5, 1)" 
# v3 
# "c(\"Nothing\", \"4\", \"2\", \"9\")" 
# v4 
# "c(\"3\", \"8\", \"Something\", \"6\")" 
grep("No", s)
# [1] 1 3

Note:

R.version.string
# [1] "R version 4.0.3 (2020-10-10)"

2 Comments

This doesn't work for me. What version of R are you on?
R4.0.2 works for me too. But never seen this. data.frame doesn't seem to be mentioned in the grep() documentation.
2

Use dplyr::select_if() function:

df <- df %>% select_if(function(col) any(grepl("No", col)))

Comments

1

You can run grepl for each column and if there's any value in there, pick it.

df = data.frame(v1 = c(1, 8, 7, 'No number'),
                v2 = c(5, 3, 5, 1),
                v3 = c('Nothing', 4, 2, 9),
                v4 = c(3, 8, 'Something', 6))

find.no <- sapply(X = df, FUN = function(x) {
  any(grep("No", x = x))
})

> df[, find.no]
         v1      v3
1         1 Nothing
2         8       4
3         7       2
4 No number       9

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.