Merge multiple dataframes with different columns and rows in pandas

Question

Assume I have 3 dataframes as follows:

2011:

Bridge_No	Location	Area	2011
1	NY	10	3
2	FL	20	4
3	NJ	15	6

2012:

Bridge_No	Location	Area	2012
2	FL	20	5
3	NJ	15	3
4	CN	45	9

2013:

Bridge_No	Location	Area	2013
2	FL	20	8
6	MI	30	8
4	CN	45	9

I need a final merged dataset as follows:

Bridge_No	Location	Area	2011	2012	2013
1	NY	10	3	Nan	Nan
2	FL	20	4	5	8
3	NJ	15	6	3	NaN
4	CN	45	Nan	9	9
6	MI	30	Nan	NaN	8

You can use pandas.pydata.org/docs/reference/api/… passing inhow='outer' and on=['Bridge_No', 'Location', 'Area'] — Vivek
– Vivek, Commented Nov 28, 2021 at 2:08

Henry Ecker · Accepted Answer · 2021-11-28 02:23:24Z

We can iterate over each DataFrame and set_index to the shared columns (the columns on which to join), then concat on axis=1 to get the complete DataFrame. reset_index is then used to restore the RangeIndex and columns:

new_df = pd.concat((
    df_.set_index(['Bridge_No', 'Location', 'Area'])
    for df_ in [df2011, df2012, df2013]
), axis=1).reset_index()

new_df:

   Bridge_No Location  Area  2011  2012  2013
0          1       NY    10   3.0   NaN   NaN
1          2       FL    20   4.0   5.0   8.0
2          3       NJ    15   6.0   3.0   NaN
3          4       CN    45   NaN   9.0   9.0
4          6       MI    30   NaN   NaN   8.0

Setup Used:

import pandas as pd

df2011 = pd.DataFrame({
    'Bridge_No': [1, 2, 3], 'Location': ['NY', 'FL', 'NJ'],
    'Area': [10, 20, 15], '2011': [3, 4, 6]
})

df2012 = pd.DataFrame({
    'Bridge_No': [2, 3, 4], 'Location': ['FL', 'NJ', 'CN'],
    'Area': [20, 15, 45], '2012': [5, 3, 9]
})

df2013 = pd.DataFrame({
    'Bridge_No': [2, 6, 4], 'Location': ['FL', 'MI', 'CN'],
    'Area': [20, 30, 45], '2013': [8, 8, 9]
})

BENY · Accepted Answer · 2021-11-28 02:42:34Z

1

Let us do reduce merge

from functools import reduce
df = reduce(lambda left,right: pd.merge(left,right,on=['Bridge_No', 'Location', 'Area'],how='outer'), [df1,df2,df3])

answered Nov 28, 2021 at 2:42

BENY

324k22 gold badges176 silver badges250 bronze badges

Comments

Abdul · Accepted Answer · 2021-11-28 02:41:34Z

0

You Can use something like this.

import pandas as pd
import numpy as np

df_2011 = pd.read_excel('2011.xlsx')
df_2012 = pd.read_excel('2012.xlsx')
df_2013 = pd.read_excel('2013.xlsx')

df_11_12 = df_2011.merge(df_2012, how='outer', on=['Bridge_No','Location','Area'])

df = df_11_12.merge(df_2013, how='outer', on=['Bridge_No','Location','Area'])

The output table looks like this

answered Nov 28, 2021 at 2:41

Abdul

1961 gold badge3 silver badges24 bronze badges

Collectives™ on Stack Overflow

Merge multiple dataframes with different columns and rows in pandas

3 Answers 3

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related