0

I have a dataframe with random repeating sequences. I need to set a flag of sorts so that every time I come across the term maintenance, I store the rows between each instance. " store everything between the two maintenance instances." The input is:

NAME     PROCESS     STATUS
name 0   process 0  
name 1   maintenance start
name 2   process 2
name 3   process 3
name 4   process 4
name 5   process 5
name 6   maintenance stop
name 7   process 7
name 8   process 8
name 9   process 9 
name 10  maintenance start
name 11  process 11
name 12  process 12
name 13  process 13
name 14  process 14
name 15  maintenance stop

I was thinking about doing a set of logical conditions:

for i in np.arange(len(df)-1):
  if df['process'][i] = 'maintenance':
     df['new'] = df[i]

but, I am hoping there is a way that pandas can handle this (that I cant find) as I cant seem to figure out the stopping condition.

desired output:

name 2   process 2
name 3   process 3
name 4   process 4
name 5   process 5
name 11  process 11
name 12  process 12
name 13  process 13
name 14  process 14

2 Answers 2

1

You can iterate over the rows in data frame and start appending rows to new frame when process="maintenance" and status="start" and stop adding them when status="stop".

import pandas as pd
from io import StringIO

data = """name,process,status
name 0,process 0,
name 1,maintenance,start
name 2,process 2,
name 3,process 3,
name 4,process 4
name 5,process 5
name 6,maintenance,stop
name 7,process 7
name 8,process 8
name 9,process 9 
name 10,maintenance,start
name 11,process 11
name 12,process 12
name 13,process 13
name 14,process 14
name 15,maintenance,stop
"""

df = pd.read_csv(StringIO(data)).fillna('')

row_list = []
collect = False
for row in df.itertuples(index=False):
    if row.process == "maintenance":
        collect = row.status == "start"
    elif collect:
        row_list.append(row)

df2 = pd.DataFrame.from_records(row_list, columns=df.columns)
print(df2)

Output:

      name     process status
0   name 2   process 2
1   name 3   process 3
2   name 4   process 4
3   name 5   process 5
4  name 11  process 11
5  name 12  process 12
6  name 13  process 13
7  name 14  process 14

If want to drop the "status" column then call this:

df2 = df2.drop('status', axis=1)
Sign up to request clarification or add additional context in comments.

Comments

0
import pandas as pd
df=pd.DataFrame([
        ['name 0','process 0',],
        ['name 1','maintenance','start'],
        ['name 2','process 2',],
        ['name 3','process 3',],
        ['name 4','process 4',],
        ['name 5','process 5',],
        ['name 6','maintenance','stop'],
        ['name 7','process 7',],
        ['name 8','process 8',],
        ['name 9','process 9',],
        ['name 10','maintenance','start'],
        ['name 11','process 11',],
        ['name 12','process 12',],
        ['name 13','process 13',],
        ['name 14','process 14',],
        ['name 15','maintenance','stop']
                   ],columns=['name','process','status'])

l_data=[] #list to store the select rows 

updating a DataFrame row by row is resource intensive. Good to store the select rows in a list and then create a DataFrame with the list

start_flag=False
for _,i in df.iterrows():  #iterrows returns (index,Series) pairs for each row.
    if i['process']=='maintenance':   #i is series with columns names as index   
        start_flag= i['status']=='start'
    elif start_flag:
        l_data.append(i.values)  #storing select rows into a list. it's a list of Series 
df=pd.DataFrame(l_data,columns=['name','process','status']) 
df.drop('status',axis=1,inplace=True)
print(df)

Output

      name     process
0   name 2   process 2
1   name 3   process 3
2   name 4   process 4
3   name 5   process 5
4  name 11  process 11
5  name 12  process 12
6  name 13  process 13
7  name 14  process 14

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.