R: Selecting Rows based on values in multiple columns

Question

I've a data frame which have many columns with common prefix "_B" e,g '_B1', '_B2',...'_Bn'. So that I can grab the column names by:

allB <- c(grep( "_B" , names( my.df ),value = TRUE ) )

I wish to select the rows for which each of these _B* columns passes a single condition like values >= some_cutoff

Can someone tell how to do that, my efforts with 'all()' and 'any()' failed

set.seed(12345)     
my.df <- data.frame(a = round(rnorm(10,5),1), m_b1= round(rnorm(10,4),1),m_b2=round(rnorm(10,4),1))
allB <- c(grep( "_b" , names( my.df ),value = TRUE ) )
> my.df
     a m_b1 m_b2
1  5.6  3.9  4.8
2  5.7  5.8  5.5
3  4.9  4.4  3.4
4  4.5  4.5  2.4
5  5.6  3.2  2.4
6  3.2  4.8  5.8
7  5.6  3.1  3.5
8  4.7  3.7  4.6
9  4.7  5.1  4.6
10 4.1  4.3  3.8

I wish to select rows for which every m_b1 and m_b2 column is >= 4.0

Try library(dplyr);my.df %>% filter_at(allB, all_vars(. >= cutoff)) Please provide a small reproducible example with expected output — akrun
– akrun, Commented Mar 15, 2018 at 15:23

akrun · Accepted Answer · 2018-03-15 15:31:25Z

6

We could use filter_at from dplyr, and specify all_vars (if all the values in the row meets the condition. If it is any of the value in the row, it would be any_vars)

library(dplyr)
my.df %>%
   filter_at(allB, all_vars(. >= some_cutoff))

data

some_cutoff <- 3
my.df <- structure(list(`_B1` = c(1, 1, 9, 4, 10), `_B2` = c(2, 3, 12, 
 6, 12), V3 = c(3, 6, 13, 10, 13), V4 = c(4, 5, 16, 13, 18)), .Names = c("_B1", 
 "_B2", "V3", "V4"), row.names = c(NA, -5L), class = "data.frame")

allB <- grep( "_B" , names( my.df ),value = TRUE )

answered Mar 15, 2018 at 15:31

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

The August Over a year ago

I'm having an error: Error in function_list[[k]](value) : could not find function "filter_at" I'm having dplyr_0.5.0, R version 3.3.0 on Win 7 64

nothing · Accepted Answer · 2018-03-15 15:53:46Z

1

In base R:

some_cutoff = 4
selectedCols <- my.df[grep("_b", names(my.df), fixed = T)]
selectedRows <- selectedCols[apply(selectedCols, 1, 
                                   function(x) all(x>=some_cutoff)), ]

selectedRows
#   m_b1 m_b2
# 2  5.8  5.5
# 6  4.8  5.8
# 9  5.1  4.6

grep() is used to get the indices of columns with the pattern of interest, which is then used to subset my.df. apply() iterates over rows when the second argument, MARGIN = 1. The anonymous function returns TRUE if all() the entries match the condition. This logical vector is then used to subset selectedCols.

answered Mar 15, 2018 at 15:53

nothing

3,3301 gold badge19 silver badges32 bronze badges

2 Comments

The August Over a year ago

Thanks. tried something similar with apply(), but had some issues. Possibly the indices of columns was what I was missing

nothing Over a year ago

@TheAugust that's most likely it. Particularly, if you had a column of strings in the data.frame, the comparison >= some_cutoff would fail

Collectives™ on Stack Overflow

R: Selecting Rows based on values in multiple columns

2 Answers 2

data

1 Comment

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

data

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related