0

I was able to figure it out with sqldf, but I want to be able to achieve the same results in pure R.

Data:

df <- read.table(header=T, text = "year1 year2 year3 year4 signup_date 
                 B      U      C         D      4/10/12 
                 C      D      B         U      2/12/12 
                 U      C      D         U      3/14/05 
                 B      NA     NA        NA     3/7/05 
                 NA     NA     NA        NA     8/3/08 
                 A      NA     NA        NA     4/6/07")

My sqldf query:

df <- sqldf("
SELECT *
FROM data
WHERE year1 NOT IN ('B','C','D','U')
AND year2 NOT IN ('B','C','D','U')
AND year3 NOT IN ('B','C','D','U')
AND year4 NOT IN ('B','C','D','U')
ORDER BY signup_date DESC")

Desired result:

    year1 year2 year3 year4 signup_date
                            8/3/08   
    A                       4/6/07 

2 Answers 2

2

Another option is to use the dplyr package:

library(dplyr)
filterVars <- c("B","C","D","U")
df %>% 
  filter(!year1 %in% filterVars, !year2 %in% filterVars, !year3 %in% filterVars, !year4 %in% filterVars) %>%
  arrange(desc(signup_date))

Yields:

  year1 year2 year3 year4 signup_date
1  <NA>  <NA>  <NA>  <NA>      8/3/08
2     A  <NA>  <NA>  <NA>      4/6/07
Sign up to request clarification or add additional context in comments.

Comments

1

Try

fvars <- c('B', 'C', 'D', 'U')
df2 <- df1[Reduce(`&`,lapply(df1[paste0('year',1:4)], 
           function(x) !x %in% fvars)),]
df2
#   year1 year2 year3 year4 signup_date
#5                              8/3/08
#6     A                        4/6/07

Or using data.table

library(data.table)
nm1 <- grep('year', names(df1))
setDT(df1)[df1[, Reduce(`&`,lapply(.SD, function(x) !x %chin% 
        fvars)) , .SDcols=nm1]][order(-signup_date)]
#   year1 year2 year3 year4 signup_date
#1:                              8/3/08
#2:     A                        4/6/07

NOTE: It may be better to order the 'signup_date' after converting to 'Date' class. ie. as.Date(df1$signup_date, '%m/%d/%y')

data

df1 <- structure(list(year1 = c("B", "C", "U", "B", "", "A"),
year2 = c("U", 
"D", "C", "", "", ""), year3 = c("C", "B", "D", "", "", ""), 
year4 = c("D", "U", "U", "", "", ""), signup_date = c("4/10/12", 
"2/12/12", "3/14/05", "3/7/05", "8/3/08", "4/6/07")),
.Names =   c("year1", 
"year2", "year3", "year4", "signup_date"), class = "data.frame", 
row.names = c(NA, -6L))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.