3

I have created the following pandas dataframe:

import pandas as pd
import numpy as np

ds = {'col1' : [234,321,284,286,287,300,301,303,305,299,288,300,299,287,286,280,279,270,269,301]}

df = pd.DataFrame(data=ds)

The dataframe looks like this:

display(df)

    col1
0   234
1   321
2   284
3   286
4   287
5   300
6   301
7   303
8   305
9   299
10  288
11  300
12  299
13  287
14  286
15  280
16  279
17  270
18  269
19  301

I have then created two columns (cnsIncr , cnsDecr), which calculate the consecutive increase and decrease in col1 respectively.

cnsIncr = []
cnsDecr = []

col1 = np.array(df['col1'])

for i in range(len(df)):
    cnsIncr.append(0)
    cnsDecr.append(0)
    if(col1[i] > col1[i-1]):
        cnsIncr[i] = cnsIncr[i-1]+1
    else:
        cnsIncr[i] = 0

    if(col1[i] < col1[i-1]):
        cnsDecr[i] = cnsDecr[i-1]+1
    else:
        cnsDecr[i] = 0

df['cnsIncr'] = cnsIncr
df['cnsDecr'] = cnsDecr

The resulting dataframe looks as follows:

enter image description here

I need to create a columns called Trend which is populated as follows:

  • if the column cnsIncr contains a value greater or equal to 5, then Trend = UpTrend starting from the record in which consIncr = 0 and ending at the record for which cnsIncr >= 5;
  • if the column cnsDecr contains a value greater or equal to 5, then Trend = DownTrend starting from the record in which cnsDecr = 0 and ending at the record for which cnsDecr >= 5;
  • the neither the column cnsIncr nor the column cnsDecr contain a value of 5, then Trend = NoTrend

So, from the example above, the resulting dataframe would look like this:

enter image description here

How can I do this?

1 Answer 1

2

I think a simple approach, assuming groups always start with a 0 and are increasing until a reset, would be to groupby.transform for stretches starting with 0 and assigning the value based on the last member of the group:

N = 5

# form groups starting with 0 for each column
g1 = df['cnsIncr'].eq(0).cumsum()
g2 = df['cnsDecr'].eq(0).cumsum()

# identify if the last value is >= 5
m1 = df['cnsIncr'].groupby(g1).transform('last').ge(N)
m2 = df['cnsDecr'].groupby(g2).transform('last').ge(N)

# assign the names based on the masks
df['Trend'] = np.select([m1, m2], ['TrendUp', 'TrendDown'], 'NoTrend')

Output:

    col1  cnsIncr  cnsDecr      Trend
0    234        0        1    NoTrend
1    321        1        0    NoTrend
2    284        0        1    TrendUp
3    286        1        0    TrendUp
4    287        2        0    TrendUp
5    300        3        0    TrendUp
6    301        4        0    TrendUp
7    303        5        0    TrendUp
8    305        6        0    TrendUp
9    299        0        1    NoTrend
10   288        0        2    NoTrend
11   300        1        0  TrendDown
12   299        0        1  TrendDown
13   287        0        2  TrendDown
14   286        0        3  TrendDown
15   280        0        4  TrendDown
16   279        0        5  TrendDown
17   270        0        6  TrendDown
18   269        0        7  TrendDown
19   301        1        0    NoTrend

Intermediates:

    col1  cnsIncr  cnsDecr      Trend  g1  g2     m1     m2
0    234        0        1    NoTrend   1   0  False  False
1    321        1        0    NoTrend   1   1  False  False
2    284        0        1    TrendUp   2   1   True  False
3    286        1        0    TrendUp   2   2   True  False
4    287        2        0    TrendUp   2   3   True  False
5    300        3        0    TrendUp   2   4   True  False
6    301        4        0    TrendUp   2   5   True  False
7    303        5        0    TrendUp   2   6   True  False
8    305        6        0    TrendUp   2   7   True  False
9    299        0        1    NoTrend   3   7  False  False
10   288        0        2    NoTrend   4   7  False  False
11   300        1        0  TrendDown   4   8  False   True
12   299        0        1  TrendDown   5   8  False   True
13   287        0        2  TrendDown   6   8  False   True
14   286        0        3  TrendDown   7   8  False   True
15   280        0        4  TrendDown   8   8  False   True
16   279        0        5  TrendDown   9   8  False   True
17   270        0        6  TrendDown  10   8  False   True
18   269        0        7  TrendDown  11   8  False   True
19   301        1        0    NoTrend  11   9  False  False

Note that you don't need to compute the cnsIncr/cnsDecr columns with a loop:

N = 5

g1 = df['cnsIncr'].eq(0).cumsum()
g2 = df['cnsDecr'].eq(0).cumsum()

m1 = df['cnsIncr'].groupby(g1).transform('last').gt(N)
m2 = df['cnsDecr'].groupby(g2).transform('last').gt(N)

df['Trend'] = np.select([m1, m2], ['TrendUp', 'TrendDown'], 'NoTrend')

Output with intermediates:

    col1      Trend  g1  g2     m1     m2
0    234    NoTrend   1   0  False  False
1    321    NoTrend   1   1  False  False
2    284    TrendUp   2   1   True  False
3    286    TrendUp   2   2   True  False
4    287    TrendUp   2   3   True  False
5    300    TrendUp   2   4   True  False
6    301    TrendUp   2   5   True  False
7    303    TrendUp   2   6   True  False
8    305    TrendUp   2   7   True  False
9    299    NoTrend   3   7  False  False
10   288    NoTrend   4   7  False  False
11   300  TrendDown   4   8  False   True
12   299  TrendDown   5   8  False   True
13   287  TrendDown   6   8  False   True
14   286  TrendDown   7   8  False   True
15   280  TrendDown   8   8  False   True
16   279  TrendDown   9   8  False   True
17   270  TrendDown  10   8  False   True
18   269  TrendDown  11   8  False   True
19   301    NoTrend  11   9  False  False
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.