3

I have a data frame that I am trying to filter based on 2 ID columns from the first data frame. Here's data frame 1:

id_1 id_2
A 1
B 2
C 3

And data frame 2:

id_1 id_2
A 1
B 2
A 2

If I do something like...

df2_filtered <- df2 %>%
filter(id_1 %in% df1$id_1 &
        id_2 %in% df1$id_2)

Then I get back the entirety of df2, which is not what I want. Even though "A" is in id_1 of df1, and "2" is in id_2 of df1, there is no row that has both. How do I fix it so that I only get back

id_1 id_2
A 1
B 2

2 Answers 2

4

You can use dplyr::inner_join:

inner_join(data1, data2)

# Joining, by = c("id_1", "id_2")
# id_1 id_2
# 1    A    1
# 2    B    2
Sign up to request clarification or add additional context in comments.

2 Comments

This does the trick. Although I have to put an extra select statement afterwards to trim out the columns I don't want, that's easy enough.
If you are trying to filter rows that are NOT included in data2, the anti_join function works for that. This post helped answer my problem which was the opposite of this. Just posting this in case someone has the same question as I did.
1

If you aren't bound to a dplyr solution, then data.table has a nice option:

library(data.table)
df1 = as.data.table(df1)
df2 = as.data.table(df2)
fintersect(df1, df2)
   id_1 id_2
1:    A    1
2:    B    2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.