1
I have a following dataframe:
Index Time User Description
1 27.10.2021 15:58:00 [email protected] Tab Alpha of type PARTSTUDIO opened by User A
2 27.10.2021 15:59:00 [email protected] Start edit of part studio feature
3 27.10.2021 15:59:00 [email protected] Cancel Operation
4 27.10.2021 15:59:00 [email protected] Tab Alpha of type PARTSTUDIO opened by User B
5 27.10.2021 15:59:00 [email protected] Start edit of part studio feature
6 27.10.2021 16:03:00 [email protected] Cancel Operation
7 27.10.2021 16:03:00 [email protected] Add assembly feature
9 27.10.2021 16:03:00 [email protected] Tab Beta of type PARTSTUDIO opened by User A
10 27.10.2021 16:15:00 [email protected] Start edit of part studio feature
11 27.10.2021 16:15:00 [email protected] Start edit of part studio feature
12 27.10.2021 16:15:00 [email protected] Tab Alpha of type PARTSTUDIO closed by User B
14 27.10.2021 16:54:00 [email protected] Add assembly feature
15 27.10.2021 16:55:00 [email protected] Tab Beta of type PARTSTUDIO closed by User A
16 27.10.2021 16:55:00 [email protected] Start edit of part studio feature
17 27.10.2021 16:55:00 [email protected] Tab Delta of type PARTSTUDIO closed by User B

Expected output:

Index Time User Description
1 27.10.2021 15:58:00 [email protected] Tab Alpha of type PARTSTUDIO opened by User A
2 27.10.2021 15:59:00 [email protected] Start edit of part studio feature
3 27.10.2021 15:59:00 [email protected] Cancel Operation
4 27.10.2021 15:59:00 [email protected] Tab Alpha of type PARTSTUDIO opened by User B
5 27.10.2021 15:59:00 [email protected] Start edit of part studio feature
6 27.10.2021 16:03:00 [email protected] Cancel Operation
7 27.10.2021 16:03:00 [email protected] Add assembly feature
8 27.10.2021 16:03:00 [email protected] Tab Alpha of type PARTSTUDIO closed by User A
9 27.10.2021 16:03:00 [email protected] Tab Beta of type PARTSTUDIO opened by User A
10 27.10.2021 16:15:00 [email protected] Start edit of part studio feature
11 27.10.2021 16:15:00 [email protected] Start edit of part studio feature
12 27.10.2021 16:15:00 [email protected] Tab Alpha of type PARTSTUDIO closed by User B
13 27.10.2021 16:15:00 [email protected] Tab Delta of type PARTSTUDIO opened by User B
14 27.10.2021 16:54:00 [email protected] Add assembly feature
15 27.10.2021 16:55:00 [email protected] Tab Beta of type PARTSTUDIO closed by User A
16 27.10.2021 16:55:00 [email protected] Start edit of part studio feature
17 27.10.2021 16:55:00 [email protected] Tab Delta of type PARTSTUDIO closed by User B

How to iterate through dataframe and check if after each value "Tab x opened by User y" in the Description column, the "Tab x closed by User y" follows somewhere further in the dataframe? If yes OK. If not, if the "Tab zz opened by User A" follows, that means that "Tab x closed by User y" is missing and should be inserted a row before the "Tab zz opened by User A" value (example index 8). Same goes vice versa (index 13). Is there a way to do this without df.iterrows? Thanks in advance.

2
  • Does the description always follow this pattern precisely? Tab [tab_name] of type [type] opened/closed by [user_name]? Commented May 9, 2022 at 9:42
  • Yes, that is correct. Commented May 9, 2022 at 13:06

1 Answer 1

1

Sorry, I forgot to answer this.

Here is one solution. Not really concise and particularly elegant, but should be faster than using iterrows for both modifying and checking future rows.

Data:

                   Time             User                                    Description
0   27.10.2021 15:58:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
2   27.10.2021 15:59:00  [email protected]                               Cancel Operation
3   27.10.2021 15:59:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
5   27.10.2021 16:03:00  [email protected]                               Cancel Operation
6   27.10.2021 16:03:00  [email protected]                           Add assembly feature
7   27.10.2021 16:03:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
8   27.10.2021 16:03:00  [email protected]  Tab Gamma of type PARTSTUDIO opened by User A
9   27.10.2021 16:14:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
10  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
11  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
12  27.10.2021 16:15:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
13  27.10.2021 16:54:00  [email protected]                           Add assembly feature
14  27.10.2021 16:55:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User A
15  27.10.2021 16:55:00  [email protected]              Start edit of part studio feature
16  27.10.2021 16:55:00  [email protected]  Tab Delta of type PARTSTUDIO closed by User B
17  27.10.2021 16:56:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
18  27.10.2021 16:57:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User B

I did add a couple of more open/close in a row for some more testing.

Code:

# Pattern to extract action info.
pattern = r'^Tab (?P<tab_name>.+) of type (?P<tab_type>.+) (?P<tab_action>\bclosed\b|\bopened\b) by (?P<user_id>.+)$'

# Add utility columns.
df = pd.concat([df, df['Description'].str.extract(pattern)], axis=1)

# Get rows with tweaked index.
def get_new_rows(df):    
    all_values = []
    for action in ['opened', 'closed']:
        action_mask = df['tab_action'].eq(action)
        first_tabs = df[df['tab_action'].eq(df['tab_action'].shift(-1)) & action_mask]
        second_tabs = df[df['tab_action'].eq(df['tab_action'].shift(1)) & action_mask]
                
        if len(first_tabs) == 0:
            continue

        if action == 'opened':
            values_tab, index_tab, offset, new_action = first_tabs, second_tabs, -0.5, 'closed'
        elif action == 'closed':
            values_tab, index_tab, offset, new_action = second_tabs, first_tabs, 0.5, 'opened'

        values_tab.index = index_tab.index + offset
        values_tab['Time'] = index_tab['Time'].to_numpy()
        values_tab['tab_action'] = new_action
        all_values.append(values_tab)
    
    last_action = df.tail(1)
    if last_action['tab_action'].iat[0] == 'opened':
        last_action.index += 0.5
        last_action['tab_action'] = 'closed'
        all_values.append(last_action)
    
    return pd.concat(all_values)


# Add new rows at the correct positions.
complete_df = pd.concat([df, df.dropna(subset='tab_action').groupby(['user_id'], as_index=False).apply(get_new_rows).droplevel(0)]).sort_index().reset_index(drop=True)

# Fix the description
fix_m = complete_df['tab_name'].notna()
complete_df.loc[fix_m, 'Description'] = ('Tab ' + complete_df.loc[fix_m, 'tab_name'] + 
                                        ' of type ' + complete_df.loc[fix_m, 'tab_type'] +
                                        ' ' + complete_df.loc[fix_m, 'tab_action'] + ' by ' +
                                        complete_df.loc[fix_m, 'user_id']) 
# Drop utility columns.
complete_df = complete_df.drop(columns=['tab_name', 'tab_type', 'tab_action', 'user_id'])

Result:

                   Time             User                                    Description
0   27.10.2021 15:58:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User A
1   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
2   27.10.2021 15:59:00  [email protected]                               Cancel Operation
3   27.10.2021 15:59:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User B
4   27.10.2021 15:59:00  [email protected]              Start edit of part studio feature
5   27.10.2021 16:03:00  [email protected]                               Cancel Operation
6   27.10.2021 16:03:00  [email protected]                           Add assembly feature
7   27.10.2021 16:03:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User A
8   27.10.2021 16:03:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
9   27.10.2021 16:03:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User A
10  27.10.2021 16:03:00  [email protected]  Tab Gamma of type PARTSTUDIO opened by User A
11  27.10.2021 16:14:00  [email protected]  Tab Gamma of type PARTSTUDIO closed by User A
12  27.10.2021 16:14:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User A
13  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
14  27.10.2021 16:15:00  [email protected]              Start edit of part studio feature
15  27.10.2021 16:15:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
16  27.10.2021 16:15:00  [email protected]  Tab Delta of type PARTSTUDIO opened by User B
17  27.10.2021 16:54:00  [email protected]                           Add assembly feature
18  27.10.2021 16:55:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User A
19  27.10.2021 16:55:00  [email protected]              Start edit of part studio feature
20  27.10.2021 16:55:00  [email protected]  Tab Delta of type PARTSTUDIO closed by User B
21  27.10.2021 16:55:00  [email protected]  Tab Alpha of type PARTSTUDIO opened by User B
22  27.10.2021 16:56:00  [email protected]  Tab Alpha of type PARTSTUDIO closed by User B
23  27.10.2021 16:56:00  [email protected]   Tab Beta of type PARTSTUDIO opened by User B
24  27.10.2021 16:57:00  [email protected]   Tab Beta of type PARTSTUDIO closed by User B
Sign up to request clarification or add additional context in comments.

7 Comments

Thank you for your solution and sorry for the late reply! It works great on the given example, however when I run the code on the .csv file I'm working on, it doesn't work as planned. Can you please take a look at the .csv file -> link
@MonaLisaAnn I see, I edited the answer. Try with this one. Unfortunately the answer is oldish so I need a bit more time to think about it. However, try this one maybe it works already. Let me know!
Thank you for you quick reply! Now there's 860 "opened by" values and 858 "closed by" values. The number of values is more accurate than the first solution (which was around 1400 values). However, the number doesn't match :/
@MonaLisaAnn I see, I will look a bit deeper into it then as soon as possible. Question, is it possible that the missing ones are the ones at the end? Is it possible that the last action in the df for a user is an "open" right?
@MonaLisaAnn alright, I edited again by making sure there is always a "close" at the end if the last action was an "open". Have a look now if it is any better. However, this is starting to get ugly with all these changes. I will still have a look again at it haha
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.