I have a dataset where I use groupby and a comparison based on two columns and get as result numpy arrays. What I try to do, is to put them back to the dataframe.
Logic:
I have this dataframe df with the following columns: id, cluster, a, b. Pasting here for reproduction purposes:
individual cluster a b
9710556 0 180.82 140
9710556 0 180.82 140
9710556 0 202.32 145
9710556 1 218.32 145
9710556 1 250.82 140
I try to find for every row the number of a, b values that are strictly less (in both values) than other a,b values within every id (onIndiv column below) and also within every id and cluster (onIndivCluster column below). This is is the desired output I expect:
individual cluster a b onIndiv onIndivCluster
9710556 0 180.82 140 2 1
9710556 0 180.82 140 2 1
9710556 0 202.32 145 0 0
9710556 1 218.32 145 0 0
9710556 1 250.82 140 0 0
This is a function I came up with which does this:
def ranker(df):
values = df[["a", "b"]].values
result = values[:, None] < values
return np.logical_and.reduce(result, axis = 2).sum(axis = 1)
df.groupby("individual").apply(ranker)
Out[192]:
id
9710556 [2, 2, 0, 0, 0]
dtype: object
small.groupby(["individual", "cluster"]).apply(ranker)
Out[169]:
individual cluster
9710556 0 [1, 1, 0]
1 [0, 0]
dtype: object
How can I assign these results to the original dataframe to get my desired output?