Determine if Values are within range based on pandas DataFrame column

Question

I am trying to determine whether or a given value in a row of a DataFrame is within two other columns from a separate DataFrame, or if that estimate is zero.

import pandas as pd

df = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])

   lo1  up1  lo2  up2
0   -1    2    1    3
1    4    6    7    8
2   -2   10   11   13
3    5    6    8    9

df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

   pe1  pe2
0    1    3
1    4    6
2    5    8
3   10    2

To be more clear, is it possible to develop a for-loop or use a function that can look at pe1 and its corresponding values and determine if they are within lo1 and up1, if lo1 and up1 cross zero, and if pe1=0? I am having a hard time coding this in Python.

EDIT: I'd like the output to be something like:

Since the only pe that falls within its corresponding lo and up column are in the first row, second column, and second row, first column.

Will you please add a sample of your expected output? Thank you :) — user17242583
– user17242583, Commented Jan 3, 2022 at 17:35
That's better, thank you. Will you please tell me how the first item of m1 becomes 0? — user17242583
– user17242583, Commented Jan 3, 2022 at 17:48
The first item becomes zero because the distance between lo1 and up1 contains zero. — AW27
– AW27, Commented Jan 3, 2022 at 17:50
yes, they'll always be the same length and set up in this way. — AW27
– AW27, Commented Jan 3, 2022 at 18:09

rpanai · Accepted Answer · 2022-01-03 21:38:19Z

You can eventually concatenate the two dataframes along the horizontal axis and then use np.where. This has a similar behaviour as where used by RJ Adriaansen.

import pandas as pd
import numpy as np

# Data
df1 = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])


df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

# concatenate dfs
df = pd.concat([df1, df2], axis=1)

where now df looks like

   lo1  up1  lo2  up2  pe1  pe2
0   -1    2    1    3    1    3
1    4    6    7    8    4    6
2   -2   10   11   13    5    8
3    5    6    8    9   10    2

Finally we use np.where and between

for k in [1, 2]:
    df[f"m{k}"] = np.where(
        (df[f"pe{k}"].between(df[f"lo{k}"], df[f"up{k}"]) & 
         df[f"lo{k}"].gt(0)),
        df[f"pe{k}"],
        0)

and the result is

   lo1  up1  lo2  up2  pe1  pe2  m1  m2
0   -1    2    1    3    1    3   0   3
1    4    6    7    8    4    6   4   0
2   -2   10   11   13    5    8   0   0
3    5    6    8    9   10    2   0   0

RJ Adriaansen · Accepted Answer · 2022-01-03 21:12:51Z

1

You can create a boolean mask for the required condition. For pe1 that would be:

value in lo1 is smaller or equal to pe1
value in up1 is larger or equal to pe1
value in lo1 is larger than 0

This would make this mask:

(df['lo1'] <= df2['pe1'])  &  (df['up1'] >= df2['pe1']) & (df['lo1'] > 0)

which returns:

0    False
1     True
2    False
3    False
dtype: bool

Now you can use where to keep the values that match True and replace those who don't with 0:

df2['pe1'] = df2['pe1'].where((df['lo1'] <= df2['pe1']) & (df['up1'] >= df2['pe1']) & (df['lo1'] > 0), other=0)
df2['pe2'] = df2['pe2'].where((df['lo2'] <= df2['pe2']) & (df['up2'] >= df2['pe2']) & (df['lo2'] > 0), other=0)

Result:

	pe1	pe2
0	0	3
1	4	0
2	0	0
3	0	0

To loop all columns:

for i in df2.columns:
    nr = i[2:] #remove the first two characters to get the number, then use that number to match the columns in the other df
    df2[i] = df2[i].where((df[f'lo{nr}'] <= df2[i]) & (df[f'up{nr}'] >= df2[i]) & (df[f'lo{nr}'] > 0), other=0)

edited Jan 3, 2022 at 21:12

answered Jan 3, 2022 at 21:01

RJ Adriaansen

9,7092 gold badges16 silver badges29 bronze badges

4 Comments

AW27 Over a year ago

Thanks, is it possible to create a loop to do this if I have more data than just these columns?

RJ Adriaansen Over a year ago

@AW27 Sure, see updated answer

AW27 Over a year ago

Also, I see your bullet points, I just want to make sure that 0 is not between lo1 and up1

RJ Adriaansen Over a year ago

Then replace (df[f'lo{nr}'] > 0) with ((df[f'lo{nr}'] > 0) | (df[f'up{nr}'] < 0))

Collectives™ on Stack Overflow

Determine if Values are within range based on pandas DataFrame column

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related