How to insert rows in dataframe based on specific condition?

Question

I have a following dataframe:

Index	Time	User	Description
1	27.10.2021 15:58:00	[email protected]	Tab Alpha of type PARTSTUDIO opened by User A
2	27.10.2021 15:59:00	[email protected]	Start edit of part studio feature
3	27.10.2021 15:59:00	[email protected]	Cancel Operation
4	27.10.2021 15:59:00	[email protected]	Tab Alpha of type PARTSTUDIO opened by User B
5	27.10.2021 15:59:00	[email protected]	Start edit of part studio feature
6	27.10.2021 16:03:00	[email protected]	Cancel Operation
7	27.10.2021 16:03:00	[email protected]	Add assembly feature
9	27.10.2021 16:03:00	[email protected]	Tab Beta of type PARTSTUDIO opened by User A
10	27.10.2021 16:15:00	[email protected]	Start edit of part studio feature
11	27.10.2021 16:15:00	[email protected]	Start edit of part studio feature
12	27.10.2021 16:15:00	[email protected]	Tab Alpha of type PARTSTUDIO closed by User B
14	27.10.2021 16:54:00	[email protected]	Add assembly feature
15	27.10.2021 16:55:00	[email protected]	Tab Beta of type PARTSTUDIO closed by User A
16	27.10.2021 16:55:00	[email protected]	Start edit of part studio feature
17	27.10.2021 16:55:00	[email protected]	Tab Delta of type PARTSTUDIO closed by User B

Expected output:

Index	Time	User	Description
1	27.10.2021 15:58:00	[email protected]	Tab Alpha of type PARTSTUDIO opened by User A
2	27.10.2021 15:59:00	[email protected]	Start edit of part studio feature
3	27.10.2021 15:59:00	[email protected]	Cancel Operation
4	27.10.2021 15:59:00	[email protected]	Tab Alpha of type PARTSTUDIO opened by User B
5	27.10.2021 15:59:00	[email protected]	Start edit of part studio feature
6	27.10.2021 16:03:00	[email protected]	Cancel Operation
7	27.10.2021 16:03:00	[email protected]	Add assembly feature
8	27.10.2021 16:03:00	[email protected]	Tab Alpha of type PARTSTUDIO closed by User A
9	27.10.2021 16:03:00	[email protected]	Tab Beta of type PARTSTUDIO opened by User A
10	27.10.2021 16:15:00	[email protected]	Start edit of part studio feature
11	27.10.2021 16:15:00	[email protected]	Start edit of part studio feature
12	27.10.2021 16:15:00	[email protected]	Tab Alpha of type PARTSTUDIO closed by User B
13	27.10.2021 16:15:00	[email protected]	Tab Delta of type PARTSTUDIO opened by User B
14	27.10.2021 16:54:00	[email protected]	Add assembly feature
15	27.10.2021 16:55:00	[email protected]	Tab Beta of type PARTSTUDIO closed by User A
16	27.10.2021 16:55:00	[email protected]	Start edit of part studio feature
17	27.10.2021 16:55:00	[email protected]	Tab Delta of type PARTSTUDIO closed by User B

How to iterate through dataframe and check if after each value "Tab x opened by User y" in the Description column, the "Tab x closed by User y" follows somewhere further in the dataframe? If yes OK. If not, if the "Tab zz opened by User A" follows, that means that "Tab x closed by User y" is missing and should be inserted a row before the "Tab zz opened by User A" value (example index 8). Same goes vice versa (index 13). Is there a way to do this without df.iterrows? Thanks in advance.

Does the description always follow this pattern precisely? Tab [tab_name] of type [type] opened/closed by [user_name]? — user2246849
– user2246849, Commented May 9, 2022 at 9:42

user2246849 · Accepted Answer · 2022-05-20 13:42:38Z

1

Sorry, I forgot to answer this.

Here is one solution. Not really concise and particularly elegant, but should be faster than using iterrows for both modifying and checking future rows.

Data:

                   Time             User                                    Description
0   27.10.2021 15:58:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
2   27.10.2021 15:59:00  [email protected]                               Cancel Operation
3   27.10.2021 15:59:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
5   27.10.2021 16:03:00  [email protected]                               Cancel Operation
6   27.10.2021 16:03:00  [email protected]                           Add assembly feature
7   27.10.2021 16:03:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
8   27.10.2021 16:03:00  [email protected]  Tab Gamma of type PARTSTUDIO opened by User A
9   27.10.2021 16:14:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
10  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
11  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
12  27.10.2021 16:15:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
13  27.10.2021 16:54:00  [email protected]                           Add assembly feature
14  27.10.2021 16:55:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User A
15  27.10.2021 16:55:00  [email protected]              Start edit of part studio feature
16  27.10.2021 16:55:00  [email protected]  Tab Delta of type PARTSTUDIO closed by User B
17  27.10.2021 16:56:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
18  27.10.2021 16:57:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User B

I did add a couple of more open/close in a row for some more testing.

Code:

# Pattern to extract action info.
pattern = r'^Tab (?P<tab_name>.+) of type (?P<tab_type>.+) (?P<tab_action>\bclosed\b|\bopened\b) by (?P<user_id>.+)$'

# Add utility columns.
df = pd.concat([df, df['Description'].str.extract(pattern)], axis=1)

# Get rows with tweaked index.
def get_new_rows(df):    
    all_values = []
    for action in ['opened', 'closed']:
        action_mask = df['tab_action'].eq(action)
        first_tabs = df[df['tab_action'].eq(df['tab_action'].shift(-1)) & action_mask]
        second_tabs = df[df['tab_action'].eq(df['tab_action'].shift(1)) & action_mask]
                
        if len(first_tabs) == 0:
            continue

        if action == 'opened':
            values_tab, index_tab, offset, new_action = first_tabs, second_tabs, -0.5, 'closed'
        elif action == 'closed':
            values_tab, index_tab, offset, new_action = second_tabs, first_tabs, 0.5, 'opened'

        values_tab.index = index_tab.index + offset
        values_tab['Time'] = index_tab['Time'].to_numpy()
        values_tab['tab_action'] = new_action
        all_values.append(values_tab)
    
    last_action = df.tail(1)
    if last_action['tab_action'].iat[0] == 'opened':
        last_action.index += 0.5
        last_action['tab_action'] = 'closed'
        all_values.append(last_action)
    
    return pd.concat(all_values)


# Add new rows at the correct positions.
complete_df = pd.concat([df, df.dropna(subset='tab_action').groupby(['user_id'], as_index=False).apply(get_new_rows).droplevel(0)]).sort_index().reset_index(drop=True)

# Fix the description
fix_m = complete_df['tab_name'].notna()
complete_df.loc[fix_m, 'Description'] = ('Tab ' + complete_df.loc[fix_m, 'tab_name'] + 
                                        ' of type ' + complete_df.loc[fix_m, 'tab_type'] +
                                        ' ' + complete_df.loc[fix_m, 'tab_action'] + ' by ' +
                                        complete_df.loc[fix_m, 'user_id']) 
# Drop utility columns.
complete_df = complete_df.drop(columns=['tab_name', 'tab_type', 'tab_action', 'user_id'])

Result:

                   Time             User                                    Description
0   27.10.2021 15:58:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
2   27.10.2021 15:59:00  [email protected]                               Cancel Operation
3   27.10.2021 15:59:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
5   27.10.2021 16:03:00  [email protected]                               Cancel Operation
6   27.10.2021 16:03:00  [email protected]                           Add assembly feature
7   27.10.2021 16:03:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User A
8   27.10.2021 16:03:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
9   27.10.2021 16:03:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User A
10  27.10.2021 16:03:00  [email protected]  Tab Gamma of type PARTSTUDIO opened by User A
11  27.10.2021 16:14:00  [email protected]  Tab Gamma of type PARTSTUDIO closed by User A
12  27.10.2021 16:14:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
13  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
14  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
15  27.10.2021 16:15:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
16  27.10.2021 16:15:00  [email protected]  Tab Delta of type PARTSTUDIO opened by User B
17  27.10.2021 16:54:00  [email protected]                           Add assembly feature
18  27.10.2021 16:55:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User A
19  27.10.2021 16:55:00  [email protected]              Start edit of part studio feature
20  27.10.2021 16:55:00  [email protected]  Tab Delta of type PARTSTUDIO closed by User B
21  27.10.2021 16:55:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User B
22  27.10.2021 16:56:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
23  27.10.2021 16:56:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User B
24  27.10.2021 16:57:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User B

edited May 20, 2022 at 13:42

answered May 11, 2022 at 9:12

user2246849

4,4371 gold badge15 silver badges17 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

MonaLisaAnn Over a year ago

Thank you for your solution and sorry for the late reply! It works great on the given example, however when I run the code on the .csv file I'm working on, it doesn't work as planned. Can you please take a look at the .csv file -> link

user2246849 Over a year ago

@MonaLisaAnn I see, I edited the answer. Try with this one. Unfortunately the answer is oldish so I need a bit more time to think about it. However, try this one maybe it works already. Let me know!

MonaLisaAnn Over a year ago

Thank you for you quick reply! Now there's 860 "opened by" values and 858 "closed by" values. The number of values is more accurate than the first solution (which was around 1400 values). However, the number doesn't match :/

user2246849 Over a year ago

@MonaLisaAnn I see, I will look a bit deeper into it then as soon as possible. Question, is it possible that the missing ones are the ones at the end? Is it possible that the last action in the df for a user is an "open" right?

user2246849 Over a year ago

@MonaLisaAnn alright, I edited again by making sure there is always a "close" at the end if the last action was an "open". Have a look now if it is any better. However, this is starting to get ugly with all these changes. I will still have a look again at it haha

|

Collectives™ on Stack Overflow

How to insert rows in dataframe based on specific condition?

1 Answer 1

Data:

Code:

Result:

7 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Data:

Code:

Result:

7 Comments

Your Answer

Sign up or log in

Post as a guest

Related