2

I'm new to Python and using pandas dataframes to store and work with a large dataset.

I'm interested in knowing whether it's possible to compare values between dataframes of similarly named columns. For example, the functionality I'm after would be similar to comparing the column 'A' in this dataframe:

   A
0  9
1  9
2  5
3  8
4  7
5  9
6  2
7  2
8  5
9  7

to the column 'A' in this one:

   A
0  6
1  3
2  7
3  8
4  2
5  5
6  1
7  8
8  4
9  9

Then, for each row I would determine which of the two 'A' values is smaller and add it to, say, a new column in the first dataframe called 'B':

   A  B
0  9  6
1  9  3
2  5  5
3  8  8
4  7  2
5  9  5
6  2  1
7  2  2
8  5  4
9  7  7

I'm aware of the

pandas.DataFrame.min 

method but as I understand it this will only located the smallest value of one column and can't be used to compare columns of different dataframes. I'm not sure of any other ways in which this functionality could be achieved

Any suggestions for solving this (probably) very simple question would be much appreciated! Thank you.

1 Answer 1

1

You can use numpy.minimum():

import numpy as np
df1['B'] = np.minimum(df1.A, df2.A)

enter image description here

Or use Series.where() to replace values:

df1['B'] = df1['A'].where(df1.A < df2.A, df2.A)
Sign up to request clarification or add additional context in comments.

3 Comments

Great, thank you for that. I used the Series.where() method. It works for most of my dataset, but in some cases I have dataframes that are of unequal length. Do you think that is likely to be the cause of this error message? ValueError: Can only compare identically-labeled Series objects Is there a workaround for dealing with dataframes of unequal lengths that you know of?
How would you like the result to be if the data frames are of unequal lengths?
Good point. Having thought about it: I'm looking to still select the smallest value even if it is located in a row that does not occur in the other DataFrame. Perhaps now though, it is best to write the smallest value to a new Series or DataFrame since it could be that the DataFrame we were previously adding to is the shorter one. I hope this is all clearly described, please let me know if I can clarify anything.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.