2

I've a data frame which have many columns with common prefix "_B" e,g '_B1', '_B2',...'_Bn'. So that I can grab the column names by:

allB <- c(grep( "_B" , names( my.df ),value = TRUE ) )

I wish to select the rows for which each of these _B* columns passes a single condition like values >= some_cutoff

Can someone tell how to do that, my efforts with 'all()' and 'any()' failed

set.seed(12345)     
my.df <- data.frame(a = round(rnorm(10,5),1), m_b1= round(rnorm(10,4),1),m_b2=round(rnorm(10,4),1))
allB <- c(grep( "_b" , names( my.df ),value = TRUE ) )
> my.df
     a m_b1 m_b2
1  5.6  3.9  4.8
2  5.7  5.8  5.5
3  4.9  4.4  3.4
4  4.5  4.5  2.4
5  5.6  3.2  2.4
6  3.2  4.8  5.8
7  5.6  3.1  3.5
8  4.7  3.7  4.6
9  4.7  5.1  4.6
10 4.1  4.3  3.8

I wish to select rows for which every m_b1 and m_b2 column is >= 4.0

1
  • 1
    Try library(dplyr);my.df %>% filter_at(allB, all_vars(. >= cutoff)) Please provide a small reproducible example with expected output Commented Mar 15, 2018 at 15:23

2 Answers 2

6

We could use filter_at from dplyr, and specify all_vars (if all the values in the row meets the condition. If it is any of the value in the row, it would be any_vars)

library(dplyr)
my.df %>%
   filter_at(allB, all_vars(. >= some_cutoff))

data

some_cutoff <- 3
my.df <- structure(list(`_B1` = c(1, 1, 9, 4, 10), `_B2` = c(2, 3, 12, 
 6, 12), V3 = c(3, 6, 13, 10, 13), V4 = c(4, 5, 16, 13, 18)), .Names = c("_B1", 
 "_B2", "V3", "V4"), row.names = c(NA, -5L), class = "data.frame")

allB <- grep( "_B" , names( my.df ),value = TRUE ) 
Sign up to request clarification or add additional context in comments.

1 Comment

I'm having an error: Error in function_list[[k]](value) : could not find function "filter_at" I'm having dplyr_0.5.0, R version 3.3.0 on Win 7 64
1

In base R:

some_cutoff = 4
selectedCols <- my.df[grep("_b", names(my.df), fixed = T)]
selectedRows <- selectedCols[apply(selectedCols, 1, 
                                   function(x) all(x>=some_cutoff)), ]

selectedRows
#   m_b1 m_b2
# 2  5.8  5.5
# 6  4.8  5.8
# 9  5.1  4.6

grep() is used to get the indices of columns with the pattern of interest, which is then used to subset my.df. apply() iterates over rows when the second argument, MARGIN = 1. The anonymous function returns TRUE if all() the entries match the condition. This logical vector is then used to subset selectedCols.

2 Comments

Thanks. tried something similar with apply(), but had some issues. Possibly the indices of columns was what I was missing
@TheAugust that's most likely it. Particularly, if you had a column of strings in the data.frame, the comparison >= some_cutoff would fail

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.