Apply row wise conditional function on dataframe python

Question

I have a dataframe in which I want to execute a function that checks if the actual value is a relative maximum, and check if the previous ''n'' values are lower than the actual value.

Having a dataframe 'df_data':

temp_list = [128.71, 130.2242, 131.0, 131.45, 129.69, 130.17, 132.63, 131.63, 131.0499, 131.74, 133.6116, 134.74, 135.99, 138.789, 137.34, 133.46, 132.43, 134.405, 128.31, 129.1]
df_data = pd.DataFrame(temp)

First I create a function that will check the previous conditions:

def get_max(high, rolling_max, prev,post):
if ((high > prev) & (high>post) & (high>rolling_max)):
    return 1
else: 
    return 0
df_data['rolling_max'] = df_data.high.rolling(n).max().shift()

Then I apply previous condition row wise:

df_data['ismax'] = df_data.apply(lambda x: get_max(df_data['high'], df_data['rolling_max'],df_data['high'].shift(1),df_data['high'].shift(-1)),axis = 1)

The problem is that I have always get the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Which comes due to applying the boolean condition from 'get_max' function to a Serie.

I will love to have a vectorized function, not using loops.

could you please fix your code and make sure it is reproducible? Also, please provide the expected output. — Pierre D
– Pierre D, Commented Oct 15, 2021 at 16:25

Muhammad Hassan · Accepted Answer · 2021-10-15 16:02:08Z

1

Try:

df_data['ismax'] = ((df_data['high'].gt(df_data.high.rolling(n).max().shift())) & (df_data['high'].gt(df_data['high'].shift(1))) & (df_data['high'].gt(df_data['high'].shift(-1)))).astype(int)

answered Oct 15, 2021 at 16:02

Muhammad Hassan

4,2492 gold badges16 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

arodrisa Over a year ago

Thanks, that's it. Do you know why gt works and not does &? When reading the documentation it says '&' is element wise and 'and' ain't.

Python on Toast · Accepted Answer · 2021-10-15 16:20:58Z

The error is occuring because you are sending the entire series (entire column) to your get_max function rather than doing it row-wise. Creating new columns for the shifted "prev" and "post" values and then using df.apply(func, axis = 1) normally will work fine here.

As you have hinted at, this solution is quite inefficient and looping through every row will become much slower as your dataframe increases in size.

On my computer, the below code posts:

LIST_MULTIPLIER = 1, Vectorised code: 0.29s, Row-wise code: 0.38s
LIST_MULTIPLIER = 100, Vectorised code: 0.31s, Row-wise code = 13.27s

In general therefore it is best to avoid using df.apply(..., axis = 1) as you can almost always get a better solution using logical operators.

import pandas as pd
from datetime import datetime

LIST_MULTIPLIER = 100
ITERATIONS = 100

def get_dataframe():
    temp_list = [128.71, 130.2242, 131.0, 131.45, 129.69, 130.17, 132.63, 
                 131.63, 131.0499, 131.74, 133.6116, 134.74, 135.99, 
                 138.789, 137.34, 133.46, 132.43, 134.405, 128.31, 129.1] * LIST_MULTIPLIER
    df = pd.DataFrame(temp_list)
    df.columns = ['high']
    return df

df_original = get_dataframe()

t1 = datetime.now()

for i in range(ITERATIONS):
    df = df_original.copy()
    df['rolling_max'] = df.high.rolling(2).max().shift()
    df['high_prev'] = df['high'].shift(1)
    df['high_post'] = df['high'].shift(-1)
    
    mask_prev = df['high'] > df['high_prev']
    mask_post = df['high'] > df['high_post']
    mask_rolling = df['high'] > df['rolling_max']
    
    mask_max = mask_prev & mask_post & mask_rolling
    
    df['ismax'] = 0
    df.loc[mask_max, 'ismax'] = 1
    
    
t2 = datetime.now()
print(f"{t2 - t1}")
df_first_method = df.copy()


t3 = datetime.now()

def get_max_rowwise(row):
    if ((row.high > row.high_prev) & 
        (row.high > row.high_post) & 
        (row.high > row.rolling_max)):
        return 1
    else: 
        return 0
    
for i in range(ITERATIONS):
    df = df_original.copy()
    df['rolling_max'] = df.high.rolling(2).max().shift()
    df['high_prev'] = df['high'].shift(1)
    df['high_post'] = df['high'].shift(-1)
    df['ismax'] = df.apply(get_max_rowwise, axis = 1)

t4 = datetime.now()
print(f"{t4 - t3}")
df_second_method = df.copy()

Collectives™ on Stack Overflow

Apply row wise conditional function on dataframe python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related