60

I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr package. I checked the other topics, but only found answers about a single string.

With base R:

library(dplyr)    
mtcars[grepl('m|ar', names(mtcars))]
###                      mpg am gear carb
### Mazda RX4           21.0  1    4    4
### Mazda RX4 Wag       21.0  1    4    4

However it doesn't work with the select/contains way:

mtcars %>% select(contains('m|ar'))
### data frame with 0 columns and 32 rows

What's wrong?

4 Answers 4

118

You can use matches

 mtcars %>%
        select(matches('m|ar')) %>%
        head(2)
 #              mpg am gear carb
 #Mazda RX4      21  1    4    4
 #Mazda RX4 Wag  21  1    4    4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
       select(contains('m'))
Sign up to request clarification or add additional context in comments.

9 Comments

Thank you @akrun, i feel stupid now :-). But one question, still: given that, why should we even use contains(), if matches() does the same and even better?
@agenis Because you might want to match "." and not have to think about how to escape it in a regular expression
@MichaelBellhouse In that case you use paste ie. paste(yourvec, collapse="|") and use that in matches
akrun, thank you so much. I;ve been doing a lot of digging and experimenting for this. All the best.
equivalent_for_filter <- df %>% filter(!grepl(paste(exclude_filter, collapse="|"),variable))
|
23

You can use contains from package dplyr, if you give a vector of text options, like this:

mtcars %>% 
       select(contains(c("m", "ar"))

2 Comments

Your answer could be improved with additional supporting information. Please edit to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center.
contains() with a vector of as many element as you want works just fine. Actually, matches() should be reserved for cases where you need complex matching using REGEX
4

You could still use grepl() from base R.

df <- mtcars[ , grepl('m|ar', names(mtcars))]

...which returns a subset dataframe, df, containing columns with m or ar in the column names

Comments

1

here's an alternative

mtcars %>% 
    select(contains('m') | contains('ar')) %>% 
    head(2)

#             mpg am gear carb
# Mazda RX4      21  1    4    4
# Mazda RX4 Wag  21  1    4    4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.