Parsing Data From Pandas Dataframe

Question

I have a dataframe with random repeating sequences. I need to set a flag of sorts so that every time I come across the term maintenance, I store the rows between each instance. " store everything between the two maintenance instances." The input is:

NAME     PROCESS     STATUS
name 0   process 0  
name 1   maintenance start
name 2   process 2
name 3   process 3
name 4   process 4
name 5   process 5
name 6   maintenance stop
name 7   process 7
name 8   process 8
name 9   process 9 
name 10  maintenance start
name 11  process 11
name 12  process 12
name 13  process 13
name 14  process 14
name 15  maintenance stop

I was thinking about doing a set of logical conditions:

for i in np.arange(len(df)-1):
  if df['process'][i] = 'maintenance':
     df['new'] = df[i]

but, I am hoping there is a way that pandas can handle this (that I cant find) as I cant seem to figure out the stopping condition.

desired output:

name 2   process 2
name 3   process 3
name 4   process 4
name 5   process 5
name 11  process 11
name 12  process 12
name 13  process 13
name 14  process 14

CodeMonkey · Accepted Answer · 2022-08-10 14:16:51Z

You can iterate over the rows in data frame and start appending rows to new frame when process="maintenance" and status="start" and stop adding them when status="stop".

import pandas as pd
from io import StringIO

data = """name,process,status
name 0,process 0,
name 1,maintenance,start
name 2,process 2,
name 3,process 3,
name 4,process 4
name 5,process 5
name 6,maintenance,stop
name 7,process 7
name 8,process 8
name 9,process 9 
name 10,maintenance,start
name 11,process 11
name 12,process 12
name 13,process 13
name 14,process 14
name 15,maintenance,stop
"""

df = pd.read_csv(StringIO(data)).fillna('')

row_list = []
collect = False
for row in df.itertuples(index=False):
    if row.process == "maintenance":
        collect = row.status == "start"
    elif collect:
        row_list.append(row)

df2 = pd.DataFrame.from_records(row_list, columns=df.columns)
print(df2)

Output:

      name     process status
0   name 2   process 2
1   name 3   process 3
2   name 4   process 4
3   name 5   process 5
4  name 11  process 11
5  name 12  process 12
6  name 13  process 13
7  name 14  process 14

If want to drop the "status" column then call this:

df2 = df2.drop('status', axis=1)

Karthik K · Accepted Answer · 2022-08-10 18:40:55Z

import pandas as pd
df=pd.DataFrame([
        ['name 0','process 0',],
        ['name 1','maintenance','start'],
        ['name 2','process 2',],
        ['name 3','process 3',],
        ['name 4','process 4',],
        ['name 5','process 5',],
        ['name 6','maintenance','stop'],
        ['name 7','process 7',],
        ['name 8','process 8',],
        ['name 9','process 9',],
        ['name 10','maintenance','start'],
        ['name 11','process 11',],
        ['name 12','process 12',],
        ['name 13','process 13',],
        ['name 14','process 14',],
        ['name 15','maintenance','stop']
                   ],columns=['name','process','status'])

l_data=[] #list to store the select rows

updating a DataFrame row by row is resource intensive. Good to store the select rows in a list and then create a DataFrame with the list

start_flag=False
for _,i in df.iterrows():  #iterrows returns (index,Series) pairs for each row.
    if i['process']=='maintenance':   #i is series with columns names as index   
        start_flag= i['status']=='start'
    elif start_flag:
        l_data.append(i.values)  #storing select rows into a list. it's a list of Series 
df=pd.DataFrame(l_data,columns=['name','process','status']) 
df.drop('status',axis=1,inplace=True)
print(df)

Output

      name     process
0   name 2   process 2
1   name 3   process 3
2   name 4   process 4
3   name 5   process 5
4  name 11  process 11
5  name 12  process 12
6  name 13  process 13
7  name 14  process 14

Collectives™ on Stack Overflow

Parsing Data From Pandas Dataframe

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related