1

I have a dataframe like this

ID, DateIndex, Qty
1, 1, 10
2, 1, 15
3, 1, 20
4, 1, 30
1, 2, 14
2, 2, 13
3, 2, 14
4, 2, 12
1, 3, 1
2, 3, 60
3, 3, 19
4, 3, 12
....

I want to output a table like this

ID, DateIndex, Qty, n-1, n-2, n-3, n-4....
1, 3, 1, -1, -1, 0, 0....
2, 3, 60, 1, 1, 0, 0....
3, 3, 19, 1, -1, 0, 0....
4, 3, 12, 0, -1, 0, 0....

The conditional is that if the qty value of that dateindex is less than the qty value for that ID at dateindex-1 it will return -1, if it is greater than it will return 1, and if it is the same or not found then it will return 0.

Here is what I have so far

import pandas
import numpy as np

df = pandas.read_csv('test.csv', parse_dates=['Date']).sort_values(['Date', 'ID'])

df['DateIndex'] = df['Date'].rank(method='dense')

I think I will need to define a function and use apply but not sure how to do it

1 Answer 1

1

It is just groupby and call diff with each group. However, Your output indicates you want to compare n against n-1, n-2, n-3.... groups and assign each to separate columnns. Therefore, you need to function to call diff multiple time with different values:

def shift_count(x, i):
    m = x.groupby('ID').Qty.diff(i)
    return  (m.gt(0).astype(int) - m.lt(0).astype(int)).rename('n-'+str(i))

This function return a series. Call it n time within a list comprehension to create list of series. Finally pd.concat it with original df

n = 4
list_series_diff = [shift_count(df, i) for i in range(1, n+1)]
pd.concat([df] + list_series_diff, axis=1)

Out[162]:
    ID  DateIndex  Qty  n-1  n-2  n-3  n-4
0    1          1   10    0    0    0    0
1    2          1   15    0    0    0    0
2    3          1   20    0    0    0    0
3    4          1   30    0    0    0    0
4    1          2   14    1    0    0    0
5    2          2   13   -1    0    0    0
6    3          2   14   -1    0    0    0
7    4          2   12   -1    0    0    0
8    1          3    1   -1   -1    0    0
9    2          3   60    1    1    0    0
10   3          3   19    1   -1    0    0
11   4          3   12    0   -1    0    0
Sign up to request clarification or add additional context in comments.

2 Comments

This works but freezes up my PC on large data sets with rows ~1300*3000 and n~1300, or it bugs out at i~650 and restarts shell. Any advice?
It happens when it tries to concat

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.