1

I have a data frame containing following columns:-

 sample.data


 a_b_c d_b_e r_f_g c_b_a
1     1     1     1     1
2     2     2     2     2
3     3     3     3     3
4     4     4     4     4

How do I select only columns that contain both let's say "a" and "c" in the column name?

3
  • could you please let us know your expected output. Commented Mar 24, 2018 at 2:31
  • The output data.frame should contain only columns:- a_b_c and c_b_a. Because both these columns contain string "a" and string "c". Commented Mar 24, 2018 at 2:32
  • yes, it contains only the columns. Commented Mar 24, 2018 at 2:33

4 Answers 4

4

To select variables that contain a and c we could do:

library(dplyr)

df %>% 
  select(matches("(a.*c)|(c.*a)"))
  a_b_c c_b_a
1     1     1
2     2     2
3     3     3
4     4     4

Note that var a_a_e is not selected because it doesn't contain c and var c_f_g is not selected because it doesn't contain a. Column names with two a's and two c's will not be selected either as seen with var a_a_e.

We could also use str_subset:

library(dplyr)
library(stringr)

df %>% 
  select(str_subset(names(df), "(a.*c)|(c.*a)"))

Data:

df <- data.frame(
  a_b_c = 1:4,
  a_a_e = 1:4,
  c_f_g = 1:4,
  c_b_a = 1:4
)
Sign up to request clarification or add additional context in comments.

2 Comments

This is precisely what I was looking for. Thanks.
Np, happy to help
2

Try df %>% dplyr::select(matches("(a|c)"))

library(dplyr)
df <- data.frame(
  a_b_c=1:4,
  d_b_e=1:4,
  r_f_g=1:4,
  c_b_a=1:4
)

Results

> df %>% dplyr::select(matches("(a|c)"))
  a_b_c c_b_a
1     1     1
2     2     2
3     3     3
4     4     4

3 Comments

Seems like the OP might require that both letters appear, since they say "a and c", not "a or c". They haven't clarified yet, though...
Its "a" and "c" that I am trying to get.
@MadhukarJha I guess df %>% select(intersect(contains("a"), contains("c"))) in that case.
0

If you want to see how it works under the hood, use the following function:

contain_both <- function(data_frame, letter_a, letter_b) {
    j <- 0
    keep_columns <- NULL
    for(i in 1:ncol(data_frame)) {
    has_letters <- unlist(strsplit(names(data_frame)[i], '_'))
    if(is.element(letter_a, has_letters) && is.element(letter_b, has_letters)) {
    j <- j + 1
    keep_columns[j] <- i
    }
    }
    return(data_frame[, keep_columns])
    }

Data:

df <- data.frame(seq(1:4), seq(1:4), seq(1:4), seq(1:4))
names(df) <- c('a_b_c', 'd_b_e', 'r_f_g', 'c_b_a')

Just pass in your data frame, along with your 2 letter choices:

Usage:

contain_both(df, 'b', 'c') 

enter image description here

1 Comment

It would have been great to see something more simple i.e using native function but this does the trick.
0

Hope this is what you are looking for:

  a_b_c <- c(1,2,3,4)
     d_b_e <- c(1,2,3,4)
    yy <- cbind(a_b_c, d_b_e)
    > yy
     a_b_c d_b_e
[1,]     1     1
[2,]     2     2
[3,]     3     3
[4,]     4     4
 yy <- as.data.frame(yy)
 yy
  a_b_c d_b_e
1     1     1
2     2     2
3     3     3
4     4     4
 y <- yy[which(names(yy) %in% "a_b_c")]
> y
  a_b_c
1     1
2     2
3     3
4     4

In your example, you can use this:

 y <- sample.data[which(names(sample.data) %in% c("a_b_c","c_b_a" )]

4 Comments

No that is not what I am looking at.
Ok, could you please write your expected output.
"In your example, you can use this: y <- sample.data[which(names(sample.data) %in% c("a_b_c","c_b_a" )]" This is specific example. I want to select all the columns that contains "a" and "c" which means I want "a_c_b_d" as well as "u_a_d_c" and so on.
you meant you want it as general. Ok.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.