1

I have two pandas dataframes:

df1 = pd.DataFrame({
 'a': [1, 2, 3],
 'b': [2, 4, 2],
 'c': [3, 1, 5],
 'd': [-1, 0, 1]
}, index=('A', 'B', 'C'))
df2 = pd.DataFrame({
 'a': [0, 5, 10],
 'b': [9, 5, 1],
 'c': [3, 4, 2],
 'd': [12, 3, 0]
}, index=('A', 'B', 'C'))

>>> print(df1)
   a  b  c  d
A  1  2  3 -1
B  2  4  1  0
C  3  2  5  1
>>> print(df2)
    a  b  c   d
A   0  9  3  12
B   5  5  4   3
C  10  1  2   0

I would like to get the value from each column of df1 that corresponds to (i.e. is at the same coordinates as) the maximal value of that same column in df2. So in the above example it should return the values [3, 2, 1, -1]. I was able to get the correct indices using idxmax:

>>> print(df2.idxmax())
a    C
b    A
c    B
d    A
dtype: object

As you can see, these are indeed the indices corresponding to the column-wise maximums in df2. However trying to index into df1 using these indices does not return the desired result:

>>> print(df1.loc[df2.idxmax()])
   a  b  c  d
C  3  2  5  1
A  1  2  3 -1
B  2  4  1  0
A  1  2  3 -1

This indexing seems to only use the values of the Series returned by idxmax. How do I correctly index df1 using both parts (labels and values) of the returned Series? Or is there maybe a simpler solution to achieve what I want?

2 Answers 2

2

Try this:

s = df2.idxmax()
[df1.loc[row, col] for col, row in s.items()]
Sign up to request clarification or add additional context in comments.

2 Comments

That will do it. I solved the issue using np.diag˙, since I realized the desired results made up the returned DataFrame's diagonal, but this seems much cleaner and more intuitive.
I used df1.at here instead of df1.loc, not sure if there would be much of a difference.
1

If you want a vectorial solution (not a loop), you can use :

import numpy as np

a = df1.to_numpy()[np.argmax(df2.to_numpy(), axis=0),
                   np.arange(df1.shape[1])]
# array([ 3,  2,  1, -1])

# as Series
out = pd.Series(a, index=df1.columns)

output:

a    3
b    2
c    1
d   -1
dtype: int64

If you have only 1 max value per column, you can use this pandas alternative:

out = df1.where(df2.eq(df2.max())).T.stack()

output:

a  C    3.0
b  A    2.0
c  B    1.0
d  A   -1.0
dtype: float64

NB. if you want you can droplevel(-1) for the same output as the first approach.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.