1

My function call looks something like

loss = log_loss(y_true=validate_d['y'], y_pred=validate_probs, sample_weight=validate_df['weight'],  normalize=True)

Is there any way to combine this with pandas rolling() functionality, so I calculate it for a trailing 10k rows window, for example?

1 Answer 1

1

I couldn't find a very clean way to make rolling() work on a multi-column dataframe, but here is the best I could do by using a custom window loss function that applies log_loss


import pandas as pd
import numpy as np
from sklearn.metrics import log_loss

# Everything in one dataframe, but you can have your pred in a separate one
# if you want
df = pd.DataFrame({
    'y': [1, 0, 1, 1, 0, 1, 0, 1],
    'y_pred': [0.7, 0.3, 0.8, 0.9, 0.4, 0.6, 0.2, 0.8],
    'weight': [1.0, 1.5, 0.5, 1.0, 2.0, 1.0, 0.8, 1.2]
})

def weighted_log_loss(window):
    # window is a series whose contents we're not interested in, we just want
    # the range to `loc` from other data frames
    y = df.loc[window.index, 'y']
    y_pred = df.loc[window.index, 'y_pred']
    weight = df.loc[window.index, 'weight']
    return log_loss(
        y_true=y,
        y_pred=y_pred,
        sample_weight=weight,
        normalize=True
    )

window_size = 3
print(df['y'].rolling(window=window_size).apply(weighted_log_loss))


Turns out there is a rolling_apply function (source) which allows directly working with multi-column dataframes and this might suit you better.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.