2

Desired result: I would like to combine dataframes such that columns in DF2 that are in DF1 are row-binded to DF1; columns that are in DF1 but NOT in DF2 have nrow(DF2) zeros added to them; and columns that are in DF2 but NOT in DF1 have nrow(DF1) zeros prepended to them and then those columns are cbinded onto the DF.

What I've tried: I've tried using ncol, nrow, and matrix to generate the parts of the final df, but I'm running into issues, such as corner cases where the second DF only has one column and ncol(DF) returns NULL, for ex.

Template for desired result:

For ex, if I have these two DFs:

> DF_dummy_1
  a b c e
1 1 a f u
2 2 b g v
3 3 c h w
4 4 d i x
5 5 e j y

> DF_dummy_2
  a b d
1 1 p k
2 2 q l
3 3 r m
4 4 s n
5 5 t o

I would like to end up with

> DF_final
    a b c e d
1   1 a f u 0
2   2 b g v 0
3   3 c h w 0
4   4 d i x 0
5   5 e j y 0
6   1 p 0 0 k
7   2 q 0 0 l
8   3 r 0 0 m
9   4 s 0 0 n
10  5 t 0 0 o

Code to generate this ex:

DF_dummy_1 <- data.frame("a" = seq(1:5), "b" = letters[1:5], "c" = letters[6:10], "e" = letters[21:25])
DF_dummy_2 <- data.frame("a" = seq(6:10), "b" = letters[16:20], "d" = letters[11:15])
DF_final <- data.frame("a" = seq(1:10), "b" = c(letters[1:5],letters[16:20]),
                       "c" = c(letters[6:10], rep("0",5)), 
                       "e" = c(letters[21:25], rep("0",5)),
                       "d" = c(rep("0",5), letters[11:15]))
0

1 Answer 1

2

You could try rbindlist with fill=TRUE option after placing the datasets in a list. (Just for convenience, named the datasets as 'df1', 'df2'), By default, 'NAs' will occupy the missing rows, which can be later replaced to 0.

library(data.table)#data.table_1.9.5
rbindlist(list(df1, df2), fill=TRUE)[,
         lapply(.SD, function(x) replace(x, is.na(x), 0))]
#    a b c e d
# 1: 1 a f u 0
# 2: 2 b g v 0
# 3: 3 c h w 0
# 4: 4 d i x 0
# 5: 5 e j y 0
# 6: 1 p 0 0 k
# 7: 2 q 0 0 l
# 8: 3 r 0 0 m
# 9: 4 s 0 0 n
#10: 5 t 0 0 o

Based on the 'data' updated in the post, some of the columns are 'factors'. You could use stringsAsFactors=FALSE in the data.frame(..) to avoid the character columns getting converted to factors. Considering these are factor columns

  rbindlist(list(DF_dummy_1, DF_dummy_2), fill=TRUE)[, 
    lapply(.SD, function(x) replace(as.character(x), is.na(x), 0))]
 #   a b c e d
 #1: 1 a f u 0
 #2: 2 b g v 0
 #3: 3 c h w 0
 #4: 4 d i x 0
 #5: 5 e j y 0
 #6: 1 p 0 0 k
 #7: 2 q 0 0 l
 #8: 3 r 0 0 m
 #9: 4 s 0 0 n
#10: 5 t 0 0 o
Sign up to request clarification or add additional context in comments.

7 Comments

I get an error when I try this: item 2 has 3 columns, inconsistent with item 1 which has 4 columns
@goldisfine I am using data.table_1.9.5 the devel version, which version of data.table do you have?
Ah, got it. Is there an easy way to update to 1.9.5?
@goldisfine Please check this link github.com/Rdatatable/data.table/wiki/Installation
@goldisfine Based on my second code, I am not getting any warnings with factor columns. But, the safest would be to create the dataset with stringsAsFactors=FALSE in the data.frame
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.