3

I have a dataframe, from which I want to select important columns, and then filter the rows to contain specific ending.

Regex expression make it simple to define my ending value using xx$ symbol. But, how to vary over multiple possible endings (xx$, yy$)?

Dummy example:

require(dplyr)

x <- c("aa", "aa", "aa", "bb", "cc", "cc", "cc")
y <- c(101, 102, 113, 201, 202, 344, 407)
type = rep("zz", 7)
df = data.frame(x, y, type)    

# Select all expressions that starts end by "7"
df %>%
  select(x, y) %>%
  filter(grepl("7$", y))

# It seems working when I explicitly define my variables, but I need to use it as a vector instead of values?
df %>%
  select(x, y) %>%
  filter(grepl("[2|7]$", y))  # need to modify this using multiple endings


# How to modify this expression, to use vector of endings (ids) instead?
ids = c(7,2)     # define vector of my values

df %>%
     select(x, y) %>%
     filter(grepl("ids$", y))  # how to change "grepl(ids, y)??"

Expected output:

   x   y type
1 aa 102   zz
2 cc 202   zz
3 cc 407   zz

Example based on this question: Regular expressions (RegEx) and dplyr::filter()

6
  • Thank you, this works well if I specify grepl("[2|7]$", y). But as this is only a dummy example, I need to rewrite it to use instead a vector of variables, ie. ids = c(2,7). How to put this into grepl statement? grepl("ids$", y) obviously does not work... Commented Jun 18, 2019 at 12:51
  • Ok, this seems to work: df %>% select(x, y) %> filter(grepl(paste(ids, collapse="|"), y)). But I don't understand why now I did not have to specify the $ at the end of the regex statement? Can you please post your comment as answer? I understand that there are many examples, but I could not imagine how to put them together... Thank you again for you help! :) Commented Jun 18, 2019 at 14:25
  • Use paste0 to add what remains, df %>% select(x, y) %> filter(grepl(paste0("(?:", paste(ids, collapse="|"), ")$"), y)) Commented Jun 18, 2019 at 14:25
  • thank you. Please, can you place your comment as an answer, that I can accept it? Commented Jun 19, 2019 at 15:56
  • Oh yes, sorry, I did not notices this answer before! Thank you for sharing this. I still would be happy if I can update the answer to my question, as maybe some other dummies will ask the same question with different words, and might find this one first. Moreover, my question combines selecting columns select and than rows using filter, which is missing from suggested answer. Commented Jun 20, 2019 at 8:28

1 Answer 1

5

You may use

df %>% 
  select(x, y) %> filter(grepl(paste0("(?:", paste(ids, collapse="|"), ")$"), y))

The paste0("(?:", paste(ids, collapse="|"), ")$") part will build an alternation pattern that will only match at the end of the string due to $ anchor at the end.

NOTE: If the values can have special regex metacharacters you need to escape the values in the character vector first:

regex.escape <- function(string) {
  gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
}
df %>% 
      select(x, y) %> filter(grepl(paste0("(?:", paste(regex.escape(ids), collapse="|"), ")$"), y))
                                                       ^^^^^^^^^^^^^^^^^

For example, paste0("(?:", paste(c("7", "8", "ids"), collapse="|"), ")$") will output (?:7|8|ids)$:

  • (?: - start of a non-capturing group that will act as a container for the alternatives, so that the $ anchor applied to all of them and not to just the last one, matching any of
    • 7 - a 7 char
  • | - or
  • 8 - an 8 char
  • | - or
  • ids - an ids substring
  • ) - end of the group
  • $ - end of the string.
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.