1

I am trying to determine whether or a given value in a row of a DataFrame is within two other columns from a separate DataFrame, or if that estimate is zero.

import pandas as pd

df = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])

   lo1  up1  lo2  up2
0   -1    2    1    3
1    4    6    7    8
2   -2   10   11   13
3    5    6    8    9

df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

   pe1  pe2
0    1    3
1    4    6
2    5    8
3   10    2

To be more clear, is it possible to develop a for-loop or use a function that can look at pe1 and its corresponding values and determine if they are within lo1 and up1, if lo1 and up1 cross zero, and if pe1=0? I am having a hard time coding this in Python.

EDIT: I'd like the output to be something like:

   m1  m2
0   0   3
1   4   0
2   0   0
3   0   0

Since the only pe that falls within its corresponding lo and up column are in the first row, second column, and second row, first column.

9
  • 2
    Will you please add a sample of your expected output? Thank you :) Commented Jan 3, 2022 at 17:35
  • 1
    @richardec please let me know if that makes sense. Commented Jan 3, 2022 at 17:41
  • That's better, thank you. Will you please tell me how the first item of m1 becomes 0? Commented Jan 3, 2022 at 17:48
  • The first item becomes zero because the distance between lo1 and up1 contains zero. Commented Jan 3, 2022 at 17:50
  • 1
    yes, they'll always be the same length and set up in this way. Commented Jan 3, 2022 at 18:09

2 Answers 2

1

You can eventually concatenate the two dataframes along the horizontal axis and then use np.where. This has a similar behaviour as where used by RJ Adriaansen.

import pandas as pd
import numpy as np

# Data
df1 = pd.DataFrame([[-1, 2, 1, 3], [4, 6, 7,8], [-2, 10, 11, 13], [5, 6, 8, 9]],
                  columns=['lo1', 'up1','lo2', 'up2'])


df2 = pd.DataFrame([[1, 3], [4, 6] , [5, 8], [10, 2,]],
                   columns=['pe1', 'pe2'])

# concatenate dfs
df = pd.concat([df1, df2], axis=1)

where now df looks like

   lo1  up1  lo2  up2  pe1  pe2
0   -1    2    1    3    1    3
1    4    6    7    8    4    6
2   -2   10   11   13    5    8
3    5    6    8    9   10    2

Finally we use np.where and between

for k in [1, 2]:
    df[f"m{k}"] = np.where(
        (df[f"pe{k}"].between(df[f"lo{k}"], df[f"up{k}"]) & 
         df[f"lo{k}"].gt(0)),
        df[f"pe{k}"],
        0)

and the result is

   lo1  up1  lo2  up2  pe1  pe2  m1  m2
0   -1    2    1    3    1    3   0   3
1    4    6    7    8    4    6   4   0
2   -2   10   11   13    5    8   0   0
3    5    6    8    9   10    2   0   0
Sign up to request clarification or add additional context in comments.

Comments

1

You can create a boolean mask for the required condition. For pe1 that would be:

  • value in lo1 is smaller or equal to pe1
  • value in up1 is larger or equal to pe1
  • value in lo1 is larger than 0

This would make this mask:

(df['lo1'] <= df2['pe1'])  &  (df['up1'] >= df2['pe1']) & (df['lo1'] > 0)

which returns:

0    False
1     True
2    False
3    False
dtype: bool

Now you can use where to keep the values that match True and replace those who don't with 0:

df2['pe1'] = df2['pe1'].where((df['lo1'] <= df2['pe1']) & (df['up1'] >= df2['pe1']) & (df['lo1'] > 0), other=0)
df2['pe2'] = df2['pe2'].where((df['lo2'] <= df2['pe2']) & (df['up2'] >= df2['pe2']) & (df['lo2'] > 0), other=0)

Result:

pe1 pe2
0 0 3
1 4 0
2 0 0
3 0 0

To loop all columns:

for i in df2.columns:
    nr = i[2:] #remove the first two characters to get the number, then use that number to match the columns in the other df
    df2[i] = df2[i].where((df[f'lo{nr}'] <= df2[i]) & (df[f'up{nr}'] >= df2[i]) & (df[f'lo{nr}'] > 0), other=0)

4 Comments

Thanks, is it possible to create a loop to do this if I have more data than just these columns?
@AW27 Sure, see updated answer
Also, I see your bullet points, I just want to make sure that 0 is not between lo1 and up1
Then replace (df[f'lo{nr}'] > 0) with ((df[f'lo{nr}'] > 0) | (df[f'up{nr}'] < 0))

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.