1

Assume I have 3 dataframes as follows:

2011:

Bridge_No Location Area 2011
1 NY 10 3
2 FL 20 4
3 NJ 15 6

2012:

Bridge_No Location Area 2012
2 FL 20 5
3 NJ 15 3
4 CN 45 9

2013:

Bridge_No Location Area 2013
2 FL 20 8
6 MI 30 8
4 CN 45 9

I need a final merged dataset as follows:

Bridge_No Location Area 2011 2012 2013
1 NY 10 3 Nan Nan
2 FL 20 4 5 8
3 NJ 15 6 3 NaN
4 CN 45 Nan 9 9
6 MI 30 Nan NaN 8
1

3 Answers 3

1

We can iterate over each DataFrame and set_index to the shared columns (the columns on which to join), then concat on axis=1 to get the complete DataFrame. reset_index is then used to restore the RangeIndex and columns:

new_df = pd.concat((
    df_.set_index(['Bridge_No', 'Location', 'Area'])
    for df_ in [df2011, df2012, df2013]
), axis=1).reset_index()

new_df:

   Bridge_No Location  Area  2011  2012  2013
0          1       NY    10   3.0   NaN   NaN
1          2       FL    20   4.0   5.0   8.0
2          3       NJ    15   6.0   3.0   NaN
3          4       CN    45   NaN   9.0   9.0
4          6       MI    30   NaN   NaN   8.0

Setup Used:

import pandas as pd

df2011 = pd.DataFrame({
    'Bridge_No': [1, 2, 3], 'Location': ['NY', 'FL', 'NJ'],
    'Area': [10, 20, 15], '2011': [3, 4, 6]
})

df2012 = pd.DataFrame({
    'Bridge_No': [2, 3, 4], 'Location': ['FL', 'NJ', 'CN'],
    'Area': [20, 15, 45], '2012': [5, 3, 9]
})

df2013 = pd.DataFrame({
    'Bridge_No': [2, 6, 4], 'Location': ['FL', 'MI', 'CN'],
    'Area': [20, 30, 45], '2013': [8, 8, 9]
})
Sign up to request clarification or add additional context in comments.

Comments

1

Let us do reduce merge

from functools import reduce
df = reduce(lambda left,right: pd.merge(left,right,on=['Bridge_No', 'Location', 'Area'],how='outer'), [df1,df2,df3])

Comments

0

You Can use something like this.

import pandas as pd
import numpy as np

df_2011 = pd.read_excel('2011.xlsx')
df_2012 = pd.read_excel('2012.xlsx')
df_2013 = pd.read_excel('2013.xlsx')

df_11_12 = df_2011.merge(df_2012, how='outer', on=['Bridge_No','Location','Area'])

df = df_11_12.merge(df_2013, how='outer', on=['Bridge_No','Location','Area'])

The output table looks like this enter image description here

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.