Join on 2 multiindex dataframes having different columns using pandas

Question

I have 2 frames :

df1 = pd.DataFrame({'Country': [ 'US', 'IT', 'FR'],
                  'Location': [ 'Hawai', 'Torino', 'Paris'],
                  '2000': [20, 40,60],
                    '2002': [100,200,300]

                   })
df1.set_index(['Country','Location'],inplace=True)

df2 = pd.DataFrame({'Country': [ 'US', 'IT', 'FR','GB'],
                '2002': [2, 4,3,6],
                  '2018': [6, 88,7,90]
                   })
df2.set_index(['Country'],inplace=True)

I would like to compute the ratio between the 2 for common years (columns)

                  2000  2002
Country Location            
US      Hawai       20   100
IT      Torino      40   200
FR      Paris       60   300
         2002  2018
Country            
US          2     6
IT          4    88
FR          3     7
GB          6    90

the ratio should produce

                      2002
    Country Location           
    US      Hawai      50
    IT      Torino     50
    FR      Paris      100

Tried the join several ways but can't achieve this. Any ideas ?

jezrael · Accepted Answer · 2019-02-23 15:35:09Z

1

Use DataFrame.div by first level:

df = df1.div(df2, level=0)
print (df)
                  2000   2002  2018
Country Location                   
US      Hawai      NaN   50.0   NaN
IT      Torino     NaN   50.0   NaN
FR      Paris      NaN  100.0   NaN

And if need remove all NaNs columns (columns which are not in both DataFrames):

df = df1.div(df2, level=0).dropna(axis=1, how='all')
print (df)
                   2002
Country Location       
US      Hawai      50.0
IT      Torino     50.0
FR      Paris     100.0

Another solution is first get columns which are in both DataFrames by intersection and filtering before division:

c = df1.columns.intersection(df2.columns)
print (c)
Index(['2002'], dtype='object')

df = df1[c].div(df2[c], level=0)
print (df)
                   2002
Country Location       
US      Hawai      50.0
IT      Torino     50.0
FR      Paris     100.0

edited Feb 23, 2019 at 15:35

answered Feb 23, 2019 at 15:28

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Crovish Over a year ago

I like the intersection approach. Can we do the same intersection on Country index in case frame1 has extra countries not existing in frame2. In this case will result in NaN and will have to clean it ?

jezrael Over a year ago

@Crovish - do you think like

df1 = pd.DataFrame({'Country': [ 'US', 'IT', 'SK'],                   'Location': [ 'Hawai', 'Torino', 'Paris'],                   '2000': [20, 40,60],                     '2002': [100,200,300]                     })

? If no match get NaNs with this solution, no necessary change.

Crovish Over a year ago

Agreed. I want to add the max value of each column at the end of the frame (an extra row) maxValues = df[:].max() give the correct answer. But I can't cocatenate this result to the original dataframe df. Any idea ?

jezrael Over a year ago

@Crovish - Better is df['max_val'] = df.max(axis=1)

Crovish Over a year ago

the challenge here is to add an extra row at the end of the file representing the max of each column of the frame. Adding an extra column is indeed more straight, but this is not what is required

|

Collectives™ on Stack Overflow

Join on 2 multiindex dataframes having different columns using pandas

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related