Combine multiple dataframes by summing certain columns in Pandas

Question

Given three dataframes:

df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': 'dog'})

how can one combine them into a single dataframe, by adding the values of a subset of given dataframes, such that the result becomes:

pd.DataFrame({'A': [8, 2], 'B': [10, 8], 'C': 'dog'})

for this example? My problem is that I also have columns which are identical, but cannot be summed (like 'C' here).

In the case of same value its not problematic. But what is the decision if one is dog and one is cat? — DeepBlue
– DeepBlue, Commented Jul 17, 2019 at 12:38

jezrael · Accepted Answer · 2019-07-17 12:51:45Z

4

One possible solution with sum if numeric values and if strings then join unique values per groups in GroupBy.agg after concat list of DataFrames:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B    C
0  8  10  dog
1  2   8  dog

If possible different values like cat and dog:

df1 = pd.DataFrame({'A': [5, 0], 'B': [2, 4], 'C': 'dog'})
df2 = pd.DataFrame({'A': [1, 1], 'B': [3, 3], 'C': 'dog'})
df3 = pd.DataFrame({'A': [2, 1], 'B': [5, 1], 'C': ['cat','dog']})


f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ','.join(x.unique())
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B        C
0  8  10  dog,cat
1  2   8      dog

If need lists:

f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else x.unique().tolist()
df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B           C
0  8  10  [dog, cat]
1  2   8       [dog]

And for combination lists with scalars for nonnumeric values use custom function:

def f(x):
    if np.issubdtype(x.dtype, np.number):
        return x.sum()
    else:
        u = x.unique().tolist()
        if len(u) == 1:
            return u[0]
        else:
            return u

df = pd.concat([df1, df2, df3], keys=range(3)).groupby(level=1).agg(f)
print (df)
   A   B           C
0  8  10  [dog, cat]
1  2   8         dog

edited Jul 17, 2019 at 12:51

answered Jul 17, 2019 at 12:43

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Qubix Over a year ago

I applied it to my real dataframes and from 3 dataframes of 5 rows × 67 columns, I get one with 5 rows × 700 columns, why does this happen?

jezrael Over a year ago

@Qubix - columns names are different in each DataFrame?

Qubix Over a year ago

you are right, they are. So can I assume that the last block of code above sums-up the columns that are common and leaves the ones that are not common with the same value they had in their original dataframe?

jezrael Over a year ago

@Qubix - hmmm, concat join together, but if different columns then no alignment data (need same columns in each for create one column after concat with same name and filled by data from each df1, df2, df3). So the best here is try normalize and unify columns names for same in each df1, df2, df3

Yonas Kassa · Accepted Answer · 2019-07-17 12:55:32Z

0

you can do it as follows:

df = df3.copy()
df[['A','B']] = df1[['A','B']]+df2[['A','B']]+df3[['A','B']]

gives the following output, if you want you can:

:df

    A   B   C
0   8   10  dog
1   2   8   dog

edited Jul 17, 2019 at 12:55

answered Jul 17, 2019 at 12:49

Yonas Kassa

3,7901 gold badge22 silver badges27 bronze badges

Collectives™ on Stack Overflow

Combine multiple dataframes by summing certain columns in Pandas

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related