Adding two dataframes in pandas with different columns

Question

I would like to add the two dataframes together as column 1 is added to column 1 (as in matrix summation based on i, j), column 2 is added to column 2 in case that the column does not exist in one of the dataframes, they should still appended from one of the dataframes.

The output should be a dataframe consisting an shown index of: ['Sun', 'Wind', 'Water', 'Flow'] then the dataframe should be ranging from 1:22.

All values are currently 0, but if column "2", cell 3 in dt1 is 200, then this cell is added to column "2" cell 3 in dt2 which is 10 for the total of 210.

import pandas as pd 
cols = range(1, 20)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt1 = pd.DataFrame(rows, index=idx, columns=cols)
dt1 = dt1.reset_index()

cols = range(3, 22)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt2 = pd.DataFrame(rows, index=idx, columns=cols)
dt2 = dt2.reset_index()


TRIED: 
df = dt1[dt1.columns[1:]].add(dt2[dt2.columns[1:]]).fillna(0)

It may be that matrix addition is the way forward with two for loops, however, I'm not quite sure how to handle the comparison of appending the right values in the right columns.

There's no need whatsoever to do dt.reset_index(), pandas can add dataframes which have an index, and also you wouldn't need to slice [1:] to avoid the index. so keep the index as-is. — smci
– smci, Commented Sep 20, 2021 at 19:34

SeaBean · Accepted Answer · 2021-09-20 21:09:21Z

You can get the union of columns by Index.union(), reindex by .reindex() with fill value 0. Then .add() the 2 dataframes and .reset_index(), as follows:

dt1a = dt1.set_index('index')
dt2a = dt2.set_index('index')
all_cols = dt1a.columns.union(dt2a.columns)

dt1b = dt1a.reindex(all_cols, axis=1, fill_value=0)
dt2b = dt2a.reindex(all_cols, axis=1, fill_value=0)

df_out = dt1b.add(dt2b).reset_index()

Data Input

dt1.at[2, 3] = 200

print(dt1)

   index  1  2    3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19
0    Sun  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
1   Wind  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
2  Water  0  0  200  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0
3   Flow  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0

dt2.at[2, 3] = 10

print(dt2)

   index   3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21
0    Sun   0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
1   Wind   0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
2  Water  10  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
3   Flow   0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0

Output

print(df_out)


   index  1  2    3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21
0    Sun  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
1   Wind  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
2  Water  0  0  210  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0
3   Flow  0  0    0  0  0  0  0  0  0   0   0   0   0   0   0   0   0   0   0   0   0

MattiH · Accepted Answer · 2021-09-20 19:31:15Z

0

I think you could reindex both df:s like this

dt1 = dt1.reindex(range(1,22))

dt2 = dt2.reindex(range(1,22))

dt3 = dt1 + dt2

answered Sep 20, 2021 at 19:31

MattiH

6646 silver badges10 bronze badges

Comments

Corralien · Accepted Answer · 2021-09-20 19:32:39Z

0

If your columns and rows are aligned between the two dataframes:

>>> dt1.iloc[:, 1:].add(dt2.iloc[:, 1:].values)

Or don't reset_index:

>>> dt1 + dt2

answered Sep 20, 2021 at 19:32

Corralien

121k8 gold badges43 silver badges69 bronze badges

Comments

tdelaney · Accepted Answer · 2021-09-20 20:25:17Z

Using difference and intersection, you could add the unknown columns from dt2 into dt1 and then sum those columns in common. The assumption here is that you want row-wise addition (that is, each dataset has rows in common), so reset_index is not needed.

import pandas as pd
cols = range(1, 20)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt1 = pd.DataFrame(rows, index=idx, columns=cols)

cols = range(3, 22)
idx = ['Sun', 'Wind', 'Water', 'Flow']
rows = [0] * int(len(cols))
rows = [rows]

dt2 = pd.DataFrame(rows, index=idx, columns=cols)

# Insert new columns from dt2 into dt1 then add common columns
common_columns = dt1.columns.intersection(dt2.columns)
new_columns = dt2.columns.difference(dt1.columns)
dt1[new_columns] = dt2[new_columns]
dt1[common_columns] += dt2[common_columns]
del dt2

Collectives™ on Stack Overflow

Adding two dataframes in pandas with different columns

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related