The following data frame is used as input:
import pandas as pd
import numpy as np
json_string = '{"datetime":{"0":1528955662000,"1":1528959255000,"2":1528965487000,"3":1528966204000,"4":1528966289000,"5":1528971637000,"6":1528974438000,"7":1528975251000,"8":1528982200000,"9":1528992569000,"10":1528994282000},"hit":{"0":1,"1":0,"2":0,"3":0,"4":0,"5":1,"6":1,"7":0,"8":1,"9":0,"10":1}}'
df = pd.read_json(json_string)
The exercise requires you to compute the mean of the hit column for each moment in time (datetime). However, the current observation should not be included in the mean. For instance, the first observation (index=0) gets np.NaN since there are no observations apart from the one we're calculating the mean for. The second observation (index=1) gets 1 since 1/1 = 1 (0 from the second observation is not included). The third observation (index=2) gets 0.5 since (1+0)/2=0.5.
My code provides a correct answer (in terms of numbers) but is not elegant. I wonder whether you can complete the exercise with something different. Is it possible to use the pandas.api.indexers.VariableOffsetWindowIndexer or pandas.api.indexers.BaseIndexer and then get_window_bounds() method?
My solution:
def add_hr(df):
"""
Generate a feature `mean_hr` which represents the average hit rate
at the moment of making the offer (`datetime`).
Parameters
----------
df : pandas.DataFrame
The `hit` column must be present. Ascending/descending order in the `datetime`
column is not assumed.
hit : int
datetime : string (format='%Y-%m-%d %H:%M:%S')
Returns
----------
df_expanded : pandas.DataFrame
A (deep) copy of the input pandas.DataFrame.
"""
df_expanded = df.copy(deep=True)
df_expanded.sort_values(by=['datetime'], ascending=True, inplace=True)
df_expanded['mean_hr'] = df_expanded['hit'].expanding().mean()
srs = df_expanded['mean_hr']
srs = srs[:len(srs)-1]
srs = pd.concat([pd.Series([np.nan]), srs])
df_expanded['mean_hr'] = srs.tolist()
return df_expanded
Full disclaimer: The exercise was a part of a recruitment process a month ago. The recruitment is now closed and I can't submit code anymore.