1

I spent about 20 minutes looking through previous questions, but could not find what I am looking for. I have a large data frame I want to subset down based on a list of names, but the names in the data frame can also have a postfix not indicated in the list.

In other words, is there a simpler generic way (for infinite numbers of postfixes) to do the following:

data <- data.frame("name"=c("name1","name1_post1","name2","name2_post1",
                            "name2_post2","name3","name4"),
                   "data"=rnorm(7,0,1),
                   stringsAsFactors=FALSE)

names <- c("name2","name3")

subset <- data[ data$name %in% names | data$name %in% paste0(names,"_post1") | data$name %in% paste0(names,"_post2") , ]

In response to @Arun's answer. The names in my data actually include more than one underscore, making the problem more complicated.

data <- data.frame("name"=c("name1_target_time","name1_target_time_post1","name2_target_time","name2_target_time_post1",
                            "name2_target_time_post2","name3_target_time","name4_target_time"),
                   "data"=rnorm(7,0,1),
                   stringsAsFactors=FALSE)

names <- c("name2_target_time","name3_target_time")

subset <- data[ data$name %in% names | data$name %in% paste0(names,"_post1") | data$name %in% paste0(names,"_post2") , ]
7
  • 20 minutes? Is it too much for you ? Commented Apr 16, 2013 at 20:02
  • Not at all. I am just saying I spent time looking through previous questions before posting. Commented Apr 16, 2013 at 20:03
  • You don't spent enough time . And you looks for a very specific solution (using grep), Is in't too much? Commented Apr 16, 2013 at 20:06
  • 2
    @agstudy sorry if I offended you. I am just trying to learn. Commented Apr 16, 2013 at 20:09
  • I am not offended. I just try to tell you that spending 20 minutes to find a solution is not the right way to learn. Commented Apr 16, 2013 at 20:11

2 Answers 2

3

Edit: solution using regular expressions (following OP's follow-up in comment):

data[grepl(paste(names, collapse="|"), data$name), ]
#          name       data
# 3       name2  1.4934931
# 4 name2_post1 -1.6070809
# 5 name2_post2 -0.4157518
# 6       name3  0.4220084

On your new data:

#                      name      data
# 3       name2_target_time 0.6295361
# 4 name2_target_time_post1 0.8951720
# 5 name2_target_time_post2 0.6602126
# 6       name3_target_time 2.2734835

Also, as @flodel shows under comments, this also works fine!

subset(data, sub("_post\\d+$", "", name) %in% names)

Old solution:

data[sapply(strsplit(data$name, "_"), "[[", 1) %in% names, ]

#          name       data
# 3       name2  1.4934931
# 4 name2_post1 -1.6070809
# 5 name2_post2 -0.4157518
# 6       name3  0.4220084

The idea: First split the string at _ using strsplit. This results in a list. For ex: name2 will result in just name2 (first element of the list). But name2_post1 will result in name2 and post1 (second element of the list). By wrapping it with sapply and using [[ with 1, we can select just the "first" element of this resulting list. Then we can use that with %in% to check if they are present in names (which is straightforward).

Sign up to request clarification or add additional context in comments.

8 Comments

That's really close (upvote). The problem is the real names I am working with have multiple underscores before the postfix. For example "name1_target1_time_1_postfix". I am really looking for some kind of grep function that will check one list for partial matches of another list.
There are a lot of ways to do it, I really asked the question to learn more about coding in r. It seems strange to me that there isn't a grep function that will look for multiple patterns.
It'd be nice if you can edit your post accordingly then showing the input and output?
Like that? subset(data, sub("_post\\d+$", "", name) %in% names)
@Arun +1 because you are patient!
|
0

A grep solution would probably look something like the following:

subset <- data[grep("(name2)|(name3)",names(data)),]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.