1

I would like to loop through df below to find the duration of the activity where:

  1. For the same X, there are two Y values.
  2. If a turns1 for either unique values of Y(eg., for X ==18, a becomes 1 for Y==13 or a becomes 1 for Y==14), it marks the start of the activity; if a turns 0 for both unique values of Y, it marks the end of the activity(eg., for X ==18, a becomes 0 for Y==13 and a becomes 0 for Y==14).
    Timestamp               X   Y   a   b       Type
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low     # Start 
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low     # End 
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium
14  2000-10-26 17:29:34.770 19  16  0.0 62.0    medium
15  2000-10-26 17:29:40.770 19  16  1.0 61.0    medium  # Start 
16  2000-10-26 17:29:46.770 19  16  1.0 60.0    medium
17  2000-10-26 17:32:01.180 19  17  1.0 25.0    low
18  2000-10-26 17:32:01.180 19  16  0.0 51.5    medium
19  2000-10-26 17:32:35.180 19  17  0.0 50.0    medium  # End 

reproducible example:

from datetime import *
from pandas import *

df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'medium',
  19: 'medium'}})

Expected output:

Start                    End                      X   
2000-10-26 10:39:24.310  2000-10-26 17:29:18.770  18
2000-10-26 17:29:40.770  2000-10-26 17:32:35.180  19

How can I do this in a for-loop so that I could further manipulate the data-frame such calculating the mean for each unique value for Y in an activity?


Edit:

Apologies for any confusion. The activity should be determined by changing in a values and should by continuous, ie., after one activity ends we reset and count the next. So for example the following data-frame should generate the same output:


    Timestamp               X   Y   a   b       Type
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium
14  2000-10-26 17:29:34.770 18  14  0.0 62.0    medium
15  2000-10-26 17:29:40.770 18  14  1.0 61.0    medium
16  2000-10-26 17:29:46.770 18  14  1.0 60.0    medium
17  2000-10-26 17:32:01.180 18  13  1.0 25.0    low
18  2000-10-26 17:32:01.180 18  14  0.0 51.5    medium
19  2000-10-26 17:32:35.180 18  13  0.0 50.0    medium

Expected output:

Start                    End                      X   
2000-10-26 10:39:24.310  2000-10-26 17:29:18.770  18
2000-10-26 17:29:40.770  2000-10-26 17:32:35.180  18
2
  • shouldn't the expected output Start be ` 2000-10-26 10:39:24.310` like you commented? Commented Nov 16, 2020 at 4:07
  • Hi @Kenan yes thanks for pointing out the typo. Commented Nov 16, 2020 at 4:10

2 Answers 2

1

Here is a one possible solution, the issue is the end can be two values based on your constraints

# Create dict for the solution
expected_dict = {'start': [], 'end': [], 'X': [], 'b_means': []}

# Loop through each group
for name, gp in df.groupby('X'):
  gp['y_shift'] = gp['Y'].shift(-1).ffill().astype(int)
  astart = gp[gp['a'].eq(1.0)].iloc[0]['pd.Timestamp']
  # End logic that can be updated
  aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp']
  b_means = gp['b'].mean()

  # Collect all values
  expected_dict ['start'].append(astart)
  expected_dict ['end'].append(aend)
  expected_dict ['X'].append(name)
  expected_dict ['b_means'].append(name)

edf = pd.DataFrame(expected_dict)

                    start                     end   X    b_means
0 2000-10-26 10:39:24.310 2000-10-26 17:29:18.770  18  37.714286
1 2000-10-26 17:29:40.770 2000-10-26 17:32:01.180  19  51.583333
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks for the solution! Do you know of an efficient way where I could collect all b values for each activity, work out the mean and append to the output table?
May I ask what the logic behind aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp'] is ?
Added b_means, y_shift is the next value, so I could check your logic ` if a turns 0 for both unique values of Y, which is ne` not equal
Yes, by a turns 0 for both unique values of Y I would like to have both 13 and 14 with a==0. Correct me if I'm wrong, my understanding is that aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp'] implies that current a ==0 and current Y does not equal to next Y but the next a could still be 1 rather than 0 right?
correct and yes the next a could be 1 but not 0, since .iloc[-1] will select the last zero with that constraint, so that next 1 would have to be the last value in that group
|
1

Code:

from datetime import *
from pandas import *

df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'medium',
  19: 'medium'}})

print(df, '\n')

Y_ended = dict()
look_for_start = True
prv_X = 0
skip_rows_till_same_X = False

for ind,row in df.iterrows():
    if skip_rows_till_same_X and prv_X == row['X']:
        continue
    else:
        skip_rows_till_same_X = False

    Y_ended[row['Y']] = False if row['a'] else True        
    
    if look_for_start:
        if row['a']: # Start
            start_timestamp = row['Timestamp']
            look_for_start = False
            Y_ended = Y_ended.fromkeys(Y_ended.keys(), False) 
    else: 
        if all(Y_ended.values()):
            end_timestamp = row['Timestamp']
            look_for_start = True
            skip_rows_till_same_X = True
            print(start_timestamp, end_timestamp, row['X'])
            Y_ended = dict()
    prv_X = row['X']  

Output:

                 Timestamp   X   Y    a     b    Type
0  2000-10-26 10:08:27.060  18  14  0.0  24.5  medium
1  2000-10-26 10:39:24.310  18  13  1.0  24.0     low
2  2000-10-26 11:50:48.190  18  14  1.0  23.5  medium
3  2000-10-26 17:18:07.610  18  14  1.0  23.5  medium
4  2000-10-26 17:18:09.610  18  14  0.0  23.5  medium
5  2000-10-26 17:29:10.610  18  14  0.0  26.5  medium
6  2000-10-26 17:29:10.770  18  14  1.0  26.5  medium
7  2000-10-26 17:29:12.610  18  14  1.0  53.5  medium
8  2000-10-26 17:29:14.610  18  14  1.0  62.0  medium
9  2000-10-26 17:29:14.770  18  13  1.0  24.0     low
10 2000-10-26 17:29:16.610  18  14  1.0  64.5  medium
11 2000-10-26 17:29:18.770  18  14  0.0  64.5  medium
12 2000-10-26 17:29:18.770  18  13  0.0  24.0     low
13 2000-10-26 17:29:28.770  18  14  0.0  63.5  medium
14 2000-10-26 17:29:34.770  19  14  0.0  62.0  medium
15 2000-10-26 17:29:40.770  19  14  1.0  61.0  medium
16 2000-10-26 17:29:46.770  19  14  1.0  60.0  medium
17 2000-10-26 17:32:01.180  19  13  1.0  25.0     low
18 2000-10-26 17:32:01.180  19  14  0.0  51.5  medium
19 2000-10-26 17:32:35.180  19  13  0.0  50.0  medium

2000-10-26 10:39:24.310000 2000-10-26 17:29:18.770000 18
2000-10-26 17:29:40.770000 2000-10-26 17:32:35.180000 19
    

1 Comment

Thanks for your solution. Could you please suggest a way to calculate the mean for each unique value for Y in an activity?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.