105

I have data similar to this:

dt <- structure(list(fct = structure(c(1L, 2L, 3L, 4L, 3L, 4L, 1L, 2L, 3L, 1L, 2L, 3L, 2L, 3L, 4L), .Label = c("a", "b", "c", "d"), class = "factor"), X = c(2L, 4L, 3L, 2L, 5L, 4L, 7L, 2L, 9L, 1L, 4L, 2L, 5L, 4L, 2L)), .Names = c("fct", "X"), class = "data.frame", row.names = c(NA, -15L))

I want to select rows from this data frame based on the values in the fct variable. For example, if I wish to select rows containing either "a" or "c" I can do this:

dt[dt$fct == 'a' | dt$fct == 'c', ]

which yields

1    a 2
3    c 3
5    c 5
7    a 7
9    c 9
10   a 1
12   c 2
14   c 4

as expected. But my actual data is more complex and I actually want to select rows based on the values in a vector such as

vc <- c('a', 'c')

So I tried

dt[dt$fct == vc, ]

but of course that doesn't work. I know I could code something to loop through the vector and pull out the rows needed and append them to a new dataframe, but I was hoping there was a more elegant way.

So how can I filter/subset my data based on the contents of the vector vc?

1
  • 24
    try: dt[dt$fct %in% vc,] Basically == is for one item and %in% is for a vector comparison. Commented Jul 23, 2012 at 12:13

3 Answers 3

176

Have a look at ?"%in%".

dt[dt$fct %in% vc,]
   fct X
1    a 2
3    c 3
5    c 5
7    a 7
9    c 9
10   a 1
12   c 2
14   c 4

You could also use ?is.element:

dt[is.element(dt$fct, vc),]
Sign up to request clarification or add additional context in comments.

1 Comment

is.element is most intuitively understandable.
39

Similar to above, using filter from dplyr:

filter(df, fct %in% vc)

Comments

17

Another option would be to use a keyed data.table:

library(data.table)
setDT(dt, key = 'fct')[J(vc)]  # or: setDT(dt, key = 'fct')[.(vc)]

which results in:

   fct X
1:   a 2
2:   a 7
3:   a 1
4:   c 3
5:   c 5
6:   c 9
7:   c 2
8:   c 4

What this does:

  • setDT(dt, key = 'fct') transforms the data.frame to a data.table (which is an enhanced form of a data.frame) with the fct column set as key.
  • Next you can just subset with the vc vector with [J(vc)].

NOTE: when the key is a factor/character variable, you can also use setDT(dt, key = 'fct')[vc] but that won't work when vc is a numeric vector. When vc is a numeric vector and is not wrapped in J() or .(), vc will work as a rowindex.

A more detailed explanation of the concept of keys and subsetting can be found in the vignette Keys and fast binary search based subset.

An alternative as suggested by @Frank in the comments:

setDT(dt)[J(vc), on=.(fct)]

When vc contains values that are not present in dt, you'll need to add nomatch = 0:

setDT(dt, key = 'fct')[J(vc), nomatch = 0]

or:

setDT(dt)[J(vc), on=.(fct), nomatch = 0]

4 Comments

I coudn't get it work when the vector and the variable in data.table are numeric. Any ideas?
@GauravSinghal updated the answer, the method in the previous version worked on for character/factor columns; the updated method also works for integer/numeric columns
@David is there a way to modify this to select only where ftc is NOT in vc?
@user2017023, yes, e.g. setDT(dt)[!(vc), on=.(fct)]

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.