Conditional column with loop

Question

I have a dataframe like this

ID, DateIndex, Qty
1, 1, 10
2, 1, 15
3, 1, 20
4, 1, 30
1, 2, 14
2, 2, 13
3, 2, 14
4, 2, 12
1, 3, 1
2, 3, 60
3, 3, 19
4, 3, 12
....

I want to output a table like this

ID, DateIndex, Qty, n-1, n-2, n-3, n-4....
1, 3, 1, -1, -1, 0, 0....
2, 3, 60, 1, 1, 0, 0....
3, 3, 19, 1, -1, 0, 0....
4, 3, 12, 0, -1, 0, 0....

The conditional is that if the qty value of that dateindex is less than the qty value for that ID at dateindex-1 it will return -1, if it is greater than it will return 1, and if it is the same or not found then it will return 0.

Here is what I have so far

import pandas
import numpy as np

df = pandas.read_csv('test.csv', parse_dates=['Date']).sort_values(['Date', 'ID'])

df['DateIndex'] = df['Date'].rank(method='dense')

I think I will need to define a function and use apply but not sure how to do it

Andy L. · Accepted Answer · 2019-08-10 00:47:01Z

1

It is just groupby and call diff with each group. However, Your output indicates you want to compare n against n-1, n-2, n-3.... groups and assign each to separate columnns. Therefore, you need to function to call diff multiple time with different values:

def shift_count(x, i):
    m = x.groupby('ID').Qty.diff(i)
    return  (m.gt(0).astype(int) - m.lt(0).astype(int)).rename('n-'+str(i))

This function return a series. Call it n time within a list comprehension to create list of series. Finally pd.concat it with original df

n = 4
list_series_diff = [shift_count(df, i) for i in range(1, n+1)]
pd.concat([df] + list_series_diff, axis=1)

Out[162]:
    ID  DateIndex  Qty  n-1  n-2  n-3  n-4
0    1          1   10    0    0    0    0
1    2          1   15    0    0    0    0
2    3          1   20    0    0    0    0
3    4          1   30    0    0    0    0
4    1          2   14    1    0    0    0
5    2          2   13   -1    0    0    0
6    3          2   14   -1    0    0    0
7    4          2   12   -1    0    0    0
8    1          3    1   -1   -1    0    0
9    2          3   60    1    1    0    0
10   3          3   19    1   -1    0    0
11   4          3   12    0   -1    0    0

answered Aug 10, 2019 at 0:47

Andy L.

25.3k4 gold badges20 silver badges30 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

user10302153 Over a year ago

This works but freezes up my PC on large data sets with rows ~1300*3000 and n~1300, or it bugs out at i~650 and restarts shell. Any advice?

user10302153 Over a year ago

It happens when it tries to concat

Collectives™ on Stack Overflow

Conditional column with loop

1 Answer 1

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related