5

I am trying to select columns where at least one row equals 1, only if the same row also has a certain value in a second column. I would prefer to achieve this using dplyr, but any computationally efficient solution is welcome.

Example:

Select columns among a1, a2, a3 containing at least one row where the value is 1 AND where column b=="B"

Example data:

rand <- function(S) {set.seed(S); sample(x = c(0,1),size = 3, replace=T)}
df <- data.frame(a1=rand(1),a2=rand(2),a3=rand(3),b=c("A","B","A"))

Input data:

  a1 a2 a3 b
1  0  0  0 A
2  0  1  1 B
3  1  1  0 A

Desired output:

  a2 a3
1  0  0
2  1  1
3  1  0

I managed to obtain the correct output with the following code, however this is a very inefficient solution and I need to run it on a very large dataframe (365,000 rows X 314 columns).

df %>% select_if(function(x) any(paste0(x,.$b) == '1B'))
2
  • 2
    You should better convert your data to long format. The reason you find this difficult is because you're trying to compute it in wide-format. Commented Dec 6, 2017 at 8:13
  • @docendodiscimus Thanks for the hint, that seems easier indeed ! Commented Dec 6, 2017 at 9:05

2 Answers 2

3

A solution, not using dplyr:

df[sapply(df[df$b == "B",], function(x) 1 %in% x)]
Sign up to request clarification or add additional context in comments.

Comments

2

Here is my dplyr solution:

ids <- df %>% 
  reshape2::melt(id.vars = "b") %>% 
  filter(value == 1 & b == "B") %>% 
  select(variable)

df[,unlist(ids)]

#  a2 a3
#1  0  0
#2  1  1
#3  1  0

As suggested by @docendo-discimus it is easier to convert to long format

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.