1

I have two data frames (df1 and df2) and I want to subset df2 based on the first two columns contained in df1. For example,

df1 = data.frame(x=c(1,1,1,1,1),y=c(1,2,3,4,5),value=c(3,4,5,6,7))
df2 = data.frame(x=c(1,1,1,1,1,2), y=c(5,3,4,2,1,6), value=c(8,9,10,11,12,13))

enter image description here

As we can see, row 6 (2,6) in df2 is not included in df1, so I will just select row 1 to row 5 in df2.

Also, I want to rearrange df2 based on df1. The final result should be like this:

enter image description here

Thanks for any help.

1
  • One possible solution is df1 %>% select(x,y) %>% inner_join(df2, by=c("x","y")) Commented May 25, 2018 at 23:44

2 Answers 2

3

When using merge, by default the data frames are joined by the variables they have in common, and the results are sorted. So you can do:

merge(df2, df1[c('x', 'y')])

#   x y value
# 1 1 1    12
# 2 1 2    11
# 3 1 3     9
# 4 1 4    10
# 5 1 5     8

To sort by the order of df1, use @Mankind_008's method

merge(df1[c('x','y')], df2 , sort = F)

Example:

set.seed(0)
df1 <- df1[sample(seq_len(nrow(df1))),]
df2 <- df2[sample(seq_len(nrow(df2))),]
df1
#   x y value
# 5 1 5     7
# 2 1 2     4
# 4 1 4     6
# 3 1 3     5
# 1 1 1     3    
merge(df1[c('x','y')], df2 , sort = F)
#   x y value
# 1 1 5     8
# 2 1 2    11
# 3 1 4    10
# 4 1 3     9
# 5 1 1    12
Sign up to request clarification or add additional context in comments.

4 Comments

To keep the order of df1, should be: merge(df1[c('x','y')], df2 , sort = False)
yup values will be the same only but order will not be. That is because df1 is already sorted in the given dummy case, in general case i.e. df1 is original scrambled, it will not retain the structure if sort = True
Thanks for the help. Just one thing to mention that sort=FALSE should be the right format (not False).
@Yang Yang You are welcome. you are right. Ryan added with a synonym 'F' in answer. so no worries.
1

Use data tables:

library(data.table)

Create your data as data.table:

df1 <- data.table( x = c(1,1,1,1,1), y = c(1,2,3,4,5), value = c(3,4,5,6,7) )
df2 <- data.table( x = c(1,1,1,1,1,2), y = c(5,3,4,2,1,6), value = c(8,9,10,11,12,13) )

Or convert your existing data.frames:

df1 <- as.data.table( df1 )
df2 <- as.data.table( df2 )

Then:

df2[ df1, on = .(x,y) ]

Any column in df1 that have the same name in df2 will be renamed as i.columnname:

   x y value i.value
1: 1 1    12       3
2: 1 2    11       4
3: 1 3     9       5
4: 1 4    10       6
5: 1 5     8       7

Note that it already order by x and y. If you want to order by the column 'value' (or any other):

df2[ df1, on = .(x,y) ][ order(value) ]

The advantage of using data.table (or dplyr, as the solution proposed by AntoniosK) is that you can keep the two data sets separated.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.