0

I want to merge three data frames in Python, the code I have now provide me with some wrong outputs.

This is the first data frame

 df_1
   Year    Month        X_1    Y_1
 0  2021    January     $90     $100
 1  2021    February    NaN     $120
 2  2021    March       $100    $130
 3  2021    April       $110    $140
 4  2021    May         Nan     $150
 5  2019    June        $120    $160
   

This is the second data frame

  df_2
    Year    Month       X_2     Y_2
  0 2021    January     Nan     $120
  1 2021    February    NaN     $130
  2 2021    March       $80     $140
  3 2021    April       $90     $150
  4 2021    May         Nan     $150
  5 2021    June        $120    $170

This is the third data frame

   df_3
     Year    Month      X_3     Y_3
  0 2021    January     $110    $150
  1 2021    February    $140    $160
  2 2021    March       $97     $170
  3 2021    April       $90     $180
  4 2021    May         Nan     $190
  5 2021    June        $120    $200

The idea is to combine them into one data frame like this:

 df_combined
    Year    Month       X_1    Y_1    X_2    Y_2    X_3    Y_3
0   2019    January     $90     $100  NaN    $120   $110    $150 
1   2019    February    NaN     $120  NaN    $130   $140    $160
2   2019    March       $100    $130  $80    $140   $97     $170
3   2019    April       $110    $140  $90    $150   $90     $180
4   2019    May         Nan     $150  Nan    $150   Nan     $190
5   2019    June        $120    $160  $120   $170   $120    $200

The code I have for now does not give me the correct outcome, only df_3 has to the correct numbers.

 # compile the list of data frames you want to merge
 import functools as ft
 from functools import reduce
    
 data_frames = [df_1, df_2, df_3]
 df_merged = reduce(lambda  cross, right: pd.merge(cross,right,on=['Year'],
                                                           how='outer'),data_frames)
  #remove superfluous columns
  df_merged.drop(['Month_x', 'Month_y'], axis=1, inplace=True)

2 Answers 2

2

You can try with

df_1.merge(df_2, how='left', on=['Year', 'Month']).merge(df_3, how='left', on=['Year', 'Month'])
Sign up to request clarification or add additional context in comments.

Comments

1

One option of probably many is to do

from functools import reduce
import pandas as pd

idx = ["Year", "Month"]
new_df = reduce(pd.DataFrame.join, (i.set_index(idx) for i in dataframes)).reset_index()

or

reduce(lambda x, y: pd.merge(x, y, how="outer", on=["Year", "Month"]), dataframes)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.