How to combine dataframes with different columns without merge?

Question

Desired result: I would like to combine dataframes such that columns in DF2 that are in DF1 are row-binded to DF1; columns that are in DF1 but NOT in DF2 have nrow(DF2) zeros added to them; and columns that are in DF2 but NOT in DF1 have nrow(DF1) zeros prepended to them and then those columns are cbinded onto the DF.

What I've tried: I've tried using ncol, nrow, and matrix to generate the parts of the final df, but I'm running into issues, such as corner cases where the second DF only has one column and ncol(DF) returns NULL, for ex.

Template for desired result:

For ex, if I have these two DFs:

> DF_dummy_1
  a b c e
1 1 a f u
2 2 b g v
3 3 c h w
4 4 d i x
5 5 e j y

> DF_dummy_2
  a b d
1 1 p k
2 2 q l
3 3 r m
4 4 s n
5 5 t o

I would like to end up with

> DF_final
    a b c e d
1   1 a f u 0
2   2 b g v 0
3   3 c h w 0
4   4 d i x 0
5   5 e j y 0
6   1 p 0 0 k
7   2 q 0 0 l
8   3 r 0 0 m
9   4 s 0 0 n
10  5 t 0 0 o

Code to generate this ex:

DF_dummy_1 <- data.frame("a" = seq(1:5), "b" = letters[1:5], "c" = letters[6:10], "e" = letters[21:25])
DF_dummy_2 <- data.frame("a" = seq(6:10), "b" = letters[16:20], "d" = letters[11:15])
DF_final <- data.frame("a" = seq(1:10), "b" = c(letters[1:5],letters[16:20]),
                       "c" = c(letters[6:10], rep("0",5)), 
                       "e" = c(letters[21:25], rep("0",5)),
                       "d" = c(rep("0",5), letters[11:15]))

akrun · Accepted Answer · 2015-03-15 15:16:46Z

2

You could try rbindlist with fill=TRUE option after placing the datasets in a list. (Just for convenience, named the datasets as 'df1', 'df2'), By default, 'NAs' will occupy the missing rows, which can be later replaced to 0.

library(data.table)#data.table_1.9.5
rbindlist(list(df1, df2), fill=TRUE)[,
         lapply(.SD, function(x) replace(x, is.na(x), 0))]
#    a b c e d
# 1: 1 a f u 0
# 2: 2 b g v 0
# 3: 3 c h w 0
# 4: 4 d i x 0
# 5: 5 e j y 0
# 6: 1 p 0 0 k
# 7: 2 q 0 0 l
# 8: 3 r 0 0 m
# 9: 4 s 0 0 n
#10: 5 t 0 0 o

Based on the 'data' updated in the post, some of the columns are 'factors'. You could use stringsAsFactors=FALSE in the data.frame(..) to avoid the character columns getting converted to factors. Considering these are factor columns

  rbindlist(list(DF_dummy_1, DF_dummy_2), fill=TRUE)[, 
    lapply(.SD, function(x) replace(as.character(x), is.na(x), 0))]
 #   a b c e d
 #1: 1 a f u 0
 #2: 2 b g v 0
 #3: 3 c h w 0
 #4: 4 d i x 0
 #5: 5 e j y 0
 #6: 1 p 0 0 k
 #7: 2 q 0 0 l
 #8: 3 r 0 0 m
 #9: 4 s 0 0 n
#10: 5 t 0 0 o

edited Mar 15, 2015 at 15:16

answered Mar 15, 2015 at 14:42

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

goldisfine Over a year ago

I get an error when I try this: item 2 has 3 columns, inconsistent with item 1 which has 4 columns

akrun Over a year ago

@goldisfine I am using data.table_1.9.5 the devel version, which version of data.table do you have?

goldisfine Over a year ago

Ah, got it. Is there an easy way to update to 1.9.5?

akrun Over a year ago

@goldisfine Please check this link github.com/Rdatatable/data.table/wiki/Installation

akrun Over a year ago

@goldisfine Based on my second code, I am not getting any warnings with factor columns. But, the safest would be to create the dataset with stringsAsFactors=FALSE in the data.frame

|

Collectives™ on Stack Overflow

How to combine dataframes with different columns without merge?

1 Answer 1

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related