0

Suppose I have a major dataset called "main" and three other datasets called "df1", "df2" and "df3".

  • "main" includes: year, colA, colB, colC
  • "df1" includes: year, colD
  • "df2" includes: year, colE
  • "df3" includes: year, colF

I want to merge colD, colE, colF of df1, df2, df3 to "main" using the year column to rename that merged column to "merged_value".

The logic is like this:

  • If colA == "Category1", then merge df1 to "main"
  • If colA == "Category2", then merge df2 to "main"
  • If colA == "Category3", then merge df3 to "main"

The only way I could think of now is to break the main into 3 subsets and do the merge (shown below), but are there any simpler approach to this?

main_sub1 = main[colA=="Category1"][df1, on="year"]
main_sub2 = main[colA=="Category2"][df2, on="year"]
main_sub3 = main[colA=="Category3"][df3, on="year"]
main_comb = rbind(main_sub1, main_sub2, main_sub3) 
4
  • Would you edit your post then add dput(your_data)? Commented Aug 9, 2022 at 18:54
  • Please make a small reproducible example to illustrate the problem. 2 or 3 rows from each of main, df1, and df2 should be plenty (choose rows that illustrate the problem well). Commented Aug 9, 2022 at 19:39
  • I updated my question with my current approach. I think it'll help illustrate the problem better. Thanks Commented Aug 9, 2022 at 21:47
  • You updated your question with the approach but we have no sample data. Refs: stackoverflow.com/q/5963269, minimal reproducible example, and stackoverflow.com/tags/r/info Commented Aug 10, 2022 at 11:46

2 Answers 2

0

Try adding colA and its "Category#" to each of the respective df#s, merging on year and colA, coalescing the merged_value:

main[df1[, colA := "Category1"], merged_value := i.colD, on = .(year, colA)]
main[df2[, colA := "Category2"], merged_value := fcoalesce(i.colE, merged_value), on = .(year, colA)]
main[df3[, colA := "Category3"], merged_value := fcoalesce(i.colF, merged_value), on = .(year, colA)]
main
#     year      colA  colB  colC merged_value
#    <int>    <char> <int> <int>        <int>
# 1:  2001 Category1     1     1           11
# 2:  2002 Category2     2     2           12
# 3:  2003 Category3     3     3           13
# 4:  2004 Category4     4     4           NA

Data

main <- data.table(year=2001:2004, colA=paste0("Category",1:4), colB=1:4, colC=1:4)
df1 <- data.table(year=2001:2004, colD=11:14)
df2 <- data.table(year=2001:2004, colE=11:14)
df3 <- data.table(year=2001:2004, colF=11:14)
Sign up to request clarification or add additional context in comments.

Comments

0

Here is another approach, similar to @r2evans (and using @r2evans data), but utilizing a single merge. Basically rowbind df1, df2, and df3, leveraging the id param

setnames(df1,"colD","val")
rbindlist(setNames(list(df1,df2,df3), paste0("Category",1:3)), id="colA")[main, on=.(colA,year)]

Output:

        colA  year   val  colB  colC
      <char> <int> <int> <int> <int>
1: Category1  2001    11     1     1
2: Category2  2002    12     2     2
3: Category3  2003    13     3     3
4: Category4  2004    NA     4     4

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.