Loop through different values in multiple columns in Python

Question

I would like to loop through df below to find the duration of the activity where:

For the same X, there are two Y values.
If a turns1 for either unique values of Y(eg., for X ==18, a becomes 1 for Y==13 or a becomes 1 for Y==14), it marks the start of the activity; if a turns 0 for both unique values of Y, it marks the end of the activity(eg., for X ==18, a becomes 0 for Y==13 and a becomes 0 for Y==14).

    Timestamp               X   Y   a   b       Type
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low     # Start 
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low     # End 
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium
14  2000-10-26 17:29:34.770 19  16  0.0 62.0    medium
15  2000-10-26 17:29:40.770 19  16  1.0 61.0    medium  # Start 
16  2000-10-26 17:29:46.770 19  16  1.0 60.0    medium
17  2000-10-26 17:32:01.180 19  17  1.0 25.0    low
18  2000-10-26 17:32:01.180 19  16  0.0 51.5    medium
19  2000-10-26 17:32:35.180 19  17  0.0 50.0    medium  # End

reproducible example:

from datetime import *
from pandas import *

df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'medium',
  19: 'medium'}})

Expected output:

Start                    End                      X   
2000-10-26 10:39:24.310  2000-10-26 17:29:18.770  18
2000-10-26 17:29:40.770  2000-10-26 17:32:35.180  19

How can I do this in a for-loop so that I could further manipulate the data-frame such calculating the mean for each unique value for Y in an activity?

Edit:

Apologies for any confusion. The activity should be determined by changing in a values and should by continuous, ie., after one activity ends we reset and count the next. So for example the following data-frame should generate the same output:


    Timestamp               X   Y   a   b       Type
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium
14  2000-10-26 17:29:34.770 18  14  0.0 62.0    medium
15  2000-10-26 17:29:40.770 18  14  1.0 61.0    medium
16  2000-10-26 17:29:46.770 18  14  1.0 60.0    medium
17  2000-10-26 17:32:01.180 18  13  1.0 25.0    low
18  2000-10-26 17:32:01.180 18  14  0.0 51.5    medium
19  2000-10-26 17:32:35.180 18  13  0.0 50.0    medium

Expected output:

Start                    End                      X   
2000-10-26 10:39:24.310  2000-10-26 17:29:18.770  18
2000-10-26 17:29:40.770  2000-10-26 17:32:35.180  18

shouldn't the expected output Start be ` 2000-10-26 10:39:24.310` like you commented? — Kenan
– Kenan, Commented Nov 16, 2020 at 4:07

Kenan · Accepted Answer · 2020-11-16 05:17:03Z

1

Here is a one possible solution, the issue is the end can be two values based on your constraints

# Create dict for the solution
expected_dict = {'start': [], 'end': [], 'X': [], 'b_means': []}

# Loop through each group
for name, gp in df.groupby('X'):
  gp['y_shift'] = gp['Y'].shift(-1).ffill().astype(int)
  astart = gp[gp['a'].eq(1.0)].iloc[0]['pd.Timestamp']
  # End logic that can be updated
  aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp']
  b_means = gp['b'].mean()

  # Collect all values
  expected_dict ['start'].append(astart)
  expected_dict ['end'].append(aend)
  expected_dict ['X'].append(name)
  expected_dict ['b_means'].append(name)

edf = pd.DataFrame(expected_dict)

                    start                     end   X    b_means
0 2000-10-26 10:39:24.310 2000-10-26 17:29:18.770  18  37.714286
1 2000-10-26 17:29:40.770 2000-10-26 17:32:01.180  19  51.583333

edited Nov 16, 2020 at 5:17

answered Nov 16, 2020 at 4:24

Kenan

14.2k9 gold badges47 silver badges56 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

nilsinelabore Over a year ago

Thanks for the solution! Do you know of an efficient way where I could collect all b values for each activity, work out the mean and append to the output table?

nilsinelabore Over a year ago

May I ask what the logic behind aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp'] is ?

Kenan Over a year ago

Added b_means, y_shift is the next value, so I could check your logic ` if a turns 0 for both unique values of Y, which is ne` not equal

nilsinelabore Over a year ago

Yes, by a turns 0 for both unique values of Y I would like to have both 13 and 14 with a==0. Correct me if I'm wrong, my understanding is that aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp'] implies that current a ==0 and current Y does not equal to next Y but the next a could still be 1 rather than 0 right?

Kenan Over a year ago

correct and yes the next a could be 1 but not 0, since .iloc[-1] will select the last zero with that constraint, so that next 1 would have to be the last value in that group

|

Aaj Kaal · Accepted Answer · 2020-11-16 05:57:53Z

Code:

from datetime import *
from pandas import *

df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'medium',
  19: 'medium'}})

print(df, '\n')

Y_ended = dict()
look_for_start = True
prv_X = 0
skip_rows_till_same_X = False

for ind,row in df.iterrows():
    if skip_rows_till_same_X and prv_X == row['X']:
        continue
    else:
        skip_rows_till_same_X = False

    Y_ended[row['Y']] = False if row['a'] else True        
    
    if look_for_start:
        if row['a']: # Start
            start_timestamp = row['Timestamp']
            look_for_start = False
            Y_ended = Y_ended.fromkeys(Y_ended.keys(), False) 
    else: 
        if all(Y_ended.values()):
            end_timestamp = row['Timestamp']
            look_for_start = True
            skip_rows_till_same_X = True
            print(start_timestamp, end_timestamp, row['X'])
            Y_ended = dict()
    prv_X = row['X']

Output:

                 Timestamp   X   Y    a     b    Type
0  2000-10-26 10:08:27.060  18  14  0.0  24.5  medium
1  2000-10-26 10:39:24.310  18  13  1.0  24.0     low
2  2000-10-26 11:50:48.190  18  14  1.0  23.5  medium
3  2000-10-26 17:18:07.610  18  14  1.0  23.5  medium
4  2000-10-26 17:18:09.610  18  14  0.0  23.5  medium
5  2000-10-26 17:29:10.610  18  14  0.0  26.5  medium
6  2000-10-26 17:29:10.770  18  14  1.0  26.5  medium
7  2000-10-26 17:29:12.610  18  14  1.0  53.5  medium
8  2000-10-26 17:29:14.610  18  14  1.0  62.0  medium
9  2000-10-26 17:29:14.770  18  13  1.0  24.0     low
10 2000-10-26 17:29:16.610  18  14  1.0  64.5  medium
11 2000-10-26 17:29:18.770  18  14  0.0  64.5  medium
12 2000-10-26 17:29:18.770  18  13  0.0  24.0     low
13 2000-10-26 17:29:28.770  18  14  0.0  63.5  medium
14 2000-10-26 17:29:34.770  19  14  0.0  62.0  medium
15 2000-10-26 17:29:40.770  19  14  1.0  61.0  medium
16 2000-10-26 17:29:46.770  19  14  1.0  60.0  medium
17 2000-10-26 17:32:01.180  19  13  1.0  25.0     low
18 2000-10-26 17:32:01.180  19  14  0.0  51.5  medium
19 2000-10-26 17:32:35.180  19  13  0.0  50.0  medium

2000-10-26 10:39:24.310000 2000-10-26 17:29:18.770000 18
2000-10-26 17:29:40.770000 2000-10-26 17:32:35.180000 19

Thanks for your solution. Could you please suggest a way to calculate the mean for each unique value for Y in an activity?

Collectives™ on Stack Overflow

Loop through different values in multiple columns in Python

2 Answers 2

7 Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related