Here's a simplified version of my dataframe:
d = {'col1': ['a1', 'a2', 'a3', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3', 'd1', 'd2', 'd3'], 'col2': [1, 1, 1, -1, -1, -1, -1, 1, 1, 1, 1, 1], 'col3': [-1, -1, 1, -1, -1, 1, 1, 1, 1, -1, 1, 1]}
df = pd.DataFrame(d)
df
col1 col2 col3
0 a1 1 -1
1 a2 1 -1
2 a3 1 1
3 b1 -1 -1
4 b2 -1 -1
5 b3 -1 1
6 c1 -1 1
7 c2 1 1
8 c3 1 1
9 d1 -1 -1
10 d2 1 -1
11 d3 1 1
i would like to be able to pull out only those rows where col3 == 1 for the first time n rows after col2 == 1 for the first time, for each letter group.
so for example, if we're looking for when col3 became 1 one row after col2 became 1 (for each letter group), we'll get
col1 col2 col3
0 d3 1 1
because for group d col2 turned from -1 to 1 at d2 and col3 turned from -1 to 1 at d3. And that hasn't happened in any other group.
if we want rows where col3 became 1 two rows after col2 became 1 (for each letter group), we'll get
col1 col2 col3
0 a3 1 1
because for group a col2 started with 1 at a1 and col3 turned from -1 to 1 at a3.
Edit:
Here's my awkward way of doing it ... anyone got more elegant solutions?
df['newCol'] = (
(((df['col2'].shift(n+1).isnull() | (df['col2'].shift(n+1) == -1)) &
(df['col2'].shift(n+1).isnull() | (df['col2'].shift(n+1) == -1))) |
(df['col1'].shift(n+1).str[0] != df['col1'].str[0])) &
(df['col2'].shift(n) == 1) &
(df['col3'].shift(n) == -1) &
(df['col2'].shift(1) == 1) &
(df['col3'].shift(1) == -1) &
(df['col2'] == 1) &
(df['col3'] == 1) &
(df['col1'].shift(n).str[0] == df['col1'].str[0])) if n > 0 \
else \
((((df['col2'].shift(n+1).isnull() | (df['col2'].shift(n+1) == -1)) &
(df['col2'].shift(n+1).isnull() | (df['col2'].shift(n+1) == -1))) |
(df['col1'].shift(n+1).str[0] != df['col1'].str[0])) &
(df['col2'] == 1) &
(df['col3'] == 1))
shift(). So then you can filter using pandas standard filters on arrays, which will be faster if your dataframe is very large.