1

Due to the poor regex knowledge, I don't know how to select specific columns in r using regex.

There is a short example. I have a dataframe df that have lots of variables.

a = c('1.age41_50', '2.age51_60', '3.age61_70', '4.age71_80',
      '5.age1_20', '6.age21_30', '7.age31_40', '8.ageupwith65', '9.agelo65', '10.PM2_5')

df = matrix(ncol = 10, nrow = 1) %>% as_tibble()
colnames(df) = a
df

I want to select specific variables using select() and matches() from dplyr package. Regex should follow the following conditions:

variable names should not contain age and _ in the meantime.

In my view, I first search variable names that contain age and _ in the meantime and then reverse select it but failed. Such as this:

df %>% select(!matches('age&_')) 

The final result should like this:

df_expected = df %>% select(`8.ageupwith65`, `9.agelo65`, `10.PM2_5`)

Any help will be highly appreciated!

2 Answers 2

1

We may use

library(dplyr)
df %>% 
   select(-contains('age'), matches('age(?!.*_)', perl = TRUE))
# A tibble: 1 × 3
  `10.PM2_5` `8.ageupwith65` `9.agelo65`
  <lgl>      <lgl>           <lgl>      
1 NA         NA              NA    
Sign up to request clarification or add additional context in comments.

Comments

0

You may use

> df %>% select(!matches('age[0-9]+_')) 
# A tibble: 1 x 3
  `8.ageupwith65` `9.agelo65` `10.PM2_5`
  <lgl>           <lgl>       <lgl>     
1 NA              NA          NA        

This expression matches age, one or more digits, and then an underscore. The final result is reversed due to the ! operator.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.