Extracting values of a pandas DataFrame using maximal indexes from another DataFrame

Question

I have two pandas dataframes:

df1 = pd.DataFrame({
 'a': [1, 2, 3],
 'b': [2, 4, 2],
 'c': [3, 1, 5],
 'd': [-1, 0, 1]
}, index=('A', 'B', 'C'))
df2 = pd.DataFrame({
 'a': [0, 5, 10],
 'b': [9, 5, 1],
 'c': [3, 4, 2],
 'd': [12, 3, 0]
}, index=('A', 'B', 'C'))

>>> print(df1)
   a  b  c  d
A  1  2  3 -1
B  2  4  1  0
C  3  2  5  1
>>> print(df2)
    a  b  c   d
A   0  9  3  12
B   5  5  4   3
C  10  1  2   0

I would like to get the value from each column of df1 that corresponds to (i.e. is at the same coordinates as) the maximal value of that same column in df2. So in the above example it should return the values [3, 2, 1, -1]. I was able to get the correct indices using idxmax:

>>> print(df2.idxmax())
a    C
b    A
c    B
d    A
dtype: object

As you can see, these are indeed the indices corresponding to the column-wise maximums in df2. However trying to index into df1 using these indices does not return the desired result:

>>> print(df1.loc[df2.idxmax()])
   a  b  c  d
C  3  2  5  1
A  1  2  3 -1
B  2  4  1  0
A  1  2  3 -1

This indexing seems to only use the values of the Series returned by idxmax. How do I correctly index df1 using both parts (labels and values) of the returned Series? Or is there maybe a simpler solution to achieve what I want?

Code Different · Accepted Answer · 2022-10-28 16:54:08Z

2

Try this:

s = df2.idxmax()
[df1.loc[row, col] for col, row in s.items()]

answered Oct 28, 2022 at 16:54

Code Different

93.4k16 gold badges154 silver badges175 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Mate de Vita Over a year ago

That will do it. I solved the issue using np.diag˙, since I realized the desired results made up the returned DataFrame's diagonal, but this seems much cleaner and more intuitive.

BeRT2me Over a year ago

I used df1.at here instead of df1.loc, not sure if there would be much of a difference.

mozway · Accepted Answer · 2022-10-28 17:22:55Z

1

If you want a vectorial solution (not a loop), you can use numpy:

import numpy as np

a = df1.to_numpy()[np.argmax(df2.to_numpy(), axis=0),
                   np.arange(df1.shape[1])]
# array([ 3,  2,  1, -1])

# as Series
out = pd.Series(a, index=df1.columns)

output:

a    3
b    2
c    1
d   -1
dtype: int64

If you have only 1 max value per column, you can use this pandas alternative:

out = df1.where(df2.eq(df2.max())).T.stack()

output:

a  C    3.0
b  A    2.0
c  B    1.0
d  A   -1.0
dtype: float64

NB. if you want you can droplevel(-1) for the same output as the first approach.

answered Oct 28, 2022 at 17:22

mozway

267k13 gold badges56 silver badges106 bronze badges

Collectives™ on Stack Overflow

Extracting values of a pandas DataFrame using maximal indexes from another DataFrame

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related