Consider this dataframe:
id name date_time strings
1 'AAA' 2018-08-03 18:00:00 1125,1517,656,657
1 'AAA' 2018-08-03 18:45:00 128,131,646,535,157,159
1 'AAA' 2018-08-03 18:49:00 131
1 'BBB' 2018-08-03 19:41:00 0
1 'BBB' 2018-08-05 19:30:00 0
1 'AAA' 2018-08-04 11:00:00 131
1 'AAA' 2018-08-04 11:30:00 1000
1 'AAA' 2018-08-04 11:33:00 1000,5555
Firstly, I want to check group of rows that share id and name if there is a common string between each consecutive rows then match is True(some of strings column have no value so they have been filled by 0. The desired output:
id name date_time strings match
1 'AAA' 2018-08-03 18:00:00 1125,128,1517,656,657 False
1 'AAA' 2018-08-03 18:45:00 128,131,646,535,157,159 True
1 'AAA' 2018-08-03 18:49:00 131 True
1 'BBB' 2018-08-03 19:41:00 0 False
1 'BBB' 2018-08-05 19:30:00 0 False
1 'AAA' 2018-08-04 11:00:00 131 True
1 'AAA' 2018-08-04 11:30:00 1000 False
1 'AAA' 2018-08-04 11:33:00 1000,5555 True
Then group rows by id and name and find the time difference between each consecutive rows in which match values are True if the time difference is less than 00:05:00 the flag is 1.The final output:
id name date_time strings diff flag
1 'AAA' 2018-08-03 18:00:00 1125,128,1517,656,657 00:00:00 0
1 'AAA' 2018-08-03 18:45:00 128,131,646,535,157,159 00:00:00 0
1 'AAA' 2018-08-03 18:49:00 131 00:04:00 1
1 'BBB' 2018-08-03 19:41:00 0 00:00:00 0
1 'BBB' 2018-08-05 19:30:00 0 00:00:00 0
1 'AAA' 2018-08-04 11:00:00 131 16:15:00 0
1 'AAA' 2018-08-04 11:30:00 1000 00:00:00 0
1 'AAA' 2018-08-04 11:33:00 1000,5555 00:33:00 0
For the first part I've tried this code but it doesn't work correctly:
grouped = df.groupby(['id','name'])
z = []
for index,row in grouped:
z.append(list(zip(row['strings'], row['strings'].shift())))
df['match'] = [bool(set(str(s1).split(','))& set(str(s2).split(','))) for i in range(len(z)) for s1,s2 in z[i]]
For the second part I've tried different solutions no one of them is working.
any hints are appreciated.