1

I have 2 frames :

df1 = pd.DataFrame({'Country': [ 'US', 'IT', 'FR'],
                  'Location': [ 'Hawai', 'Torino', 'Paris'],
                  '2000': [20, 40,60],
                    '2002': [100,200,300]

                   })
df1.set_index(['Country','Location'],inplace=True)

df2 = pd.DataFrame({'Country': [ 'US', 'IT', 'FR','GB'],
                '2002': [2, 4,3,6],
                  '2018': [6, 88,7,90]
                   })
df2.set_index(['Country'],inplace=True)  

I would like to compute the ratio between the 2 for common years (columns)

                  2000  2002
Country Location            
US      Hawai       20   100
IT      Torino      40   200
FR      Paris       60   300
         2002  2018
Country            
US          2     6
IT          4    88
FR          3     7
GB          6    90

the ratio should produce

                      2002
    Country Location           
    US      Hawai      50
    IT      Torino     50
    FR      Paris      100  

Tried the join several ways but can't achieve this. Any ideas ?

1 Answer 1

1

Use DataFrame.div by first level:

df = df1.div(df2, level=0)
print (df)
                  2000   2002  2018
Country Location                   
US      Hawai      NaN   50.0   NaN
IT      Torino     NaN   50.0   NaN
FR      Paris      NaN  100.0   NaN

And if need remove all NaNs columns (columns which are not in both DataFrames):

df = df1.div(df2, level=0).dropna(axis=1, how='all')
print (df)
                   2002
Country Location       
US      Hawai      50.0
IT      Torino     50.0
FR      Paris     100.0

Another solution is first get columns which are in both DataFrames by intersection and filtering before division:

c = df1.columns.intersection(df2.columns)
print (c)
Index(['2002'], dtype='object')

df = df1[c].div(df2[c], level=0)
print (df)
                   2002
Country Location       
US      Hawai      50.0
IT      Torino     50.0
FR      Paris     100.0
Sign up to request clarification or add additional context in comments.

6 Comments

I like the intersection approach. Can we do the same intersection on Country index in case frame1 has extra countries not existing in frame2. In this case will result in NaN and will have to clean it ?
@Crovish - do you think like df1 = pd.DataFrame({'Country': [ 'US', 'IT', 'SK'], 'Location': [ 'Hawai', 'Torino', 'Paris'], '2000': [20, 40,60], '2002': [100,200,300] }) ? If no match get NaNs with this solution, no necessary change.
Agreed. I want to add the max value of each column at the end of the frame (an extra row) maxValues = df[:].max() give the correct answer. But I can't cocatenate this result to the original dataframe df. Any idea ?
@Crovish - Better is df['max_val'] = df.max(axis=1)
the challenge here is to add an extra row at the end of the file representing the max of each column of the frame. Adding an extra column is indeed more straight, but this is not what is required
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.