0

I need to add a new row if two consecutive cells of the column door are the same and the difference between two consecutive cells in the column time is more than 5 minutes. df:

Time        door             name
09:10:00    RDC_OUT-1        alex
09:10:00    RDC_OUT-1        alex
11:23:00    RDC_IN-1         alex
12:13:00    RDC_IN-1         alex
12:39:00    RDC_OUT-1        alex
15:23:00    RDC_IN-1         alex

Code :

import pandas as pd
import numpy as np
file_name='test.xlsx'
from datetime import timedelta
import datetime

df = pd.read_excel(file_name, header=0, index= False)
df= df.sort_values(by='Time')
df.reset_index(inplace = True) 

print(df)

idx=[]
for i in range (0,len(df)):
    if i == 0:
        print ('Door Name '+str(i)+' ok')
    elif df['Door Name'][i] != df['Door Name'][i-1]:
        print('index '+str(i)+' ok')

    elif ((df['Door Name'][i] == df['Door Name'][i-1]) & ((df['Time'].iloc[i] - df['Time'].iloc[i-1]) > datetime.timedelta(minutes=5))):
        print('index '+str(i)+' ok')
        df.iloc[i] = [i,'RDC_OUT-1', str('12:00:00'), 'ARYA'] 



    elif ((df['Door Name'][i] == df['Door Name'][i-1]) & ((df['Time'].iloc[i] - df['Time'].iloc[i-1]) < datetime.timedelta(minutes=5))):
        print('index '+str(i)+' nok')
        idx.append(i)
        print('idx\n',idx)

df.drop(df.index[[idx]],inplace=True)
print('\n',df) 

Desired output:

Time        door              name
Time        door             name
09:10:00    RDC_OUT-1        alex
11:23:00    RDC_IN-1         alex
12:00:00    RDC_OUT-1        ARYA
12:13:00    RDC_IN-1         alex
12:39:00    RDC_OUT-1        alex
15:23:00    RDC_IN-1         alex

output

0      4  09:10:00    RDC_OUT-1   alex
2      3  11:23:00    RDC_IN-1    alex
3      2  12:13:00    RDC_IN-1    alex
4      3  12:00:00    RDC_OUT-1   ARYA
5      0  15:23:00    RDC_IN-1    alex
4
  • You only want to add an "out" after a double "in" or you also want to add an "in" after a double "out"? Treating them both the same in reality makes usually not so much sense ;) Commented Nov 20, 2019 at 10:30
  • I need to do both cases, but I need to insert it between the two 'in' (or two 'out' ) Commented Nov 20, 2019 at 10:31
  • every inserted row always has Time as 12:00:00 and name as ARYA? Commented Nov 20, 2019 at 11:22
  • the time cane be 12:00:00 or 14:00:00 (in case I should insert out)and the name is a variavle Commented Nov 20, 2019 at 11:27

1 Answer 1

1

So first, I highly recommend you to always deliver a working example, for copy-paste!

import pandas as pd
import numpy as np
import datetime as dt

df= pd.DataFrame({'Time':['17:01:10', '13:23:00', '11:23:00', '10:01:10','09:01:10','09:01:10'],
 'door':['RDC_OUT-1', 'RDC_IN-1','RDC_IN-1','RDC_OUT-1','RDC_IN-1','RDC_IN-1'],
 'name':['alex','alex','alex','alex','alex','alex']})

then, convert your time stamp and features, so you can do math on it:

# replace door with bin value
df['door']= df['door'].map({'RDC_IN-1': 0, 'RDC_OUT-1': 1})
# convert time stamp
df['Time'] = pd.to_datetime(df['Time'], format="%H:%M:%S")

Now you are able to unleash the power of pandas data frame ;)

# sort by time stamp
df= df.sort_values(by='Time')

# calculate difference to next row per column
df_diff = df[['Time', 'door']].diff(periods=-1)

# select and copy relevant rows 
df_add = df[(df_diff.Time < dt.timedelta(minutes=-5))& (df_diff.door ==0)].copy()

# change the time stamp of copied rows
df_add.loc[df_add.door == 0, 'Time'] =  pd.to_datetime('12:00:00', format="%H:%M:%S")
df_add.loc[df_add.door == 1, 'Time'] =  pd.to_datetime('14:00:00', format="%H:%M:%S")


# switch the label of copied rows
df_add['door']= -(df['door']-1)

# change name to mark the new
df_add['name']= 'new_alex'

# append existing data frame with new rows and sort by time stamp
df = df.append(df_add ).sort_values(by='Time')

# remap the door featuere
df['door']= df['door'].map({0:'RDC_IN-1', 1:'RDC_OUT-1'})

This should give you the output:

                 Time       door      name
4 1900-01-01 09:01:10   RDC_IN-1      alex
5 1900-01-01 09:01:10   RDC_IN-1      alex
3 1900-01-01 10:01:10  RDC_OUT-1      alex
2 1900-01-01 11:23:00   RDC_IN-1      alex
2 1900-01-01 12:00:00  RDC_OUT-1  new_alex
1 1900-01-01 13:23:00   RDC_IN-1      alex
0 1900-01-01 17:01:10  RDC_OUT-1      alex


Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.