2

I have variables with names such as r1a r3c r5e r7g r9i r11k r13g r15i etc. I am trying to select variables which starts with r5 - r12 and create a dataframe in R.

The best code that I could write to get this done is,

data %>% select(grep("r[5-9][^0-9]" , names(data), value = TRUE ),
grep("r1[0-2]", names(data), value = TRUE))

Given my experience with regular expressions span a day, I was wondering if anyone could help me write a better and compact code for this!

1
  • select is the dplyr function presumably. Commented Feb 13, 2018 at 22:30

3 Answers 3

2

Here's a regex that gets all the columns at once:

data %>% select(grep("r([5-9]|1[0-2])", names(data), value = TRUE))

The vertical bar represents an 'or'.

As the comments have pointed out, this will fail for items such as r51, and can also be shortened. Instead, you will need a slightly longer regex:

data %>% select(matches("r([5-9]|1[0-2])([^0-9]|$)"))
Sign up to request clarification or add additional context in comments.

4 Comments

You can shorten this using matches: data %>% select(matches("r([5-9]|1[0-2])"))
Good call @nielfws, I always forget about dplyr's helper functions.
Thank you. I should have thought of that.
This code will fail if we had r50,r51,r52,r60,r61,r62 etc.
2

Suppose that in the code below x represents your names(data). Then the following will do what you want.

# The names of 'data'
x <- scan(what = character(), text = "r1a r3c r5e r7g r9i r11k r13g r15i")

y <- unlist(strsplit(x, "[[:alpha:]]"))
y <- as.numeric(y[sapply(y, `!=`, "")])
x[y > 4]
#[1] "r5e"  "r7g"  "r9i"  "r11k" "r13g" "r15i"

EDIT.

You can make a function with a generalization of the above code. This function has three arguments, the first is the vector of variables names, the second and the third are the limits of the numbers you want to keep.

var_names <- function(x, from = 1, to = Inf){
    y <- unlist(strsplit(x, "[[:alpha:]]"))
    y <- as.integer(y[sapply(y, `!=`, "")])
    x[from <= y & y <= to]
}

var_names(x, 5)
#[1] "r5e"  "r7g"  "r9i"  "r11k" "r13g" "r15i"

1 Comment

Thank you. I will give this a try. I have 1279 variables but with the form rNi where N = 1(1) 15 and i = alphabets and I need to select with the same criteria. Do you think it would make your code a bit cumbersome? Sorry for question within a comment.
1

Remove the non-digits, scan the remainder in and check whether each is in 5:12 :

DF <- data.frame(r1a=1, r3c=2, r5e=3, r7g=4, r9i=5, r11k=6, r13g=7, r15i=8) # test data

DF[scan(text = gsub("\\D", "", names(DF)), quiet = TRUE) %in% 5:12]
##   r5e r7g r9i r11k
## 1   3   4   5    6

Using magrittr it could also be written like this:

library(magrittr)

DF %>% .[scan(text = gsub("\\D", "", names(.)), quiet = TRUE) %in% 5:12]
##   r5e r7g r9i r11k
## 1   3   4   5    6

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.