How to split rows in a dataframe based on column value?

Question

I am trying to split my rows into multiple rows based on if a value exists in a column. I am able to filter based on the column value but would like to split the row into two based on the condition and return only specific parts of the row. The dataframe consists of different types of payments and the activity field represents the payment type that occurred. I would like for when there is a specific payment type to split the row into two. In the example below, when payment 2 exists I want it to be a separate transaction.

I have the following dataframe:

ID      Payment 1      Payment 2      Payment 3    Payment 4    Payment 5      Activity 
1          10             0              0            0              0         Payment 1  
2          0              10             15           0              0         Payment 2, Payment 3
3          10             20             30           0              0         Payment 1, Payment 2, Payment 3

But I would like for the dataframe to be:

ID      Payment 1      Payment 2      Payment 3    Payment 4    Payment 5       Activity 
 1          10             0              0            0              0         Payment 1  
 2          0              10             0            0              0         Payment 2
 2          0              0              15           0              0         Payment 3
 3          10             0              30           0              0         Payment 1, Payment 3
 3          0              20             0            0              0         Payment 2

prob much better way but you could do a str.split(expand=True) on "Payment 2," and ", Payment 2". That way you have a column of payment 2's when they exist — noah
– noah, Commented Nov 4, 2020 at 22:42

Gerd · Accepted Answer · 2020-11-05 11:11:17Z

The problem here is that you want to change both the Payment columns and the Activity column at the same time. So actually, I only see the naive approach of iterating through the DataFrame and writing new rows to a new DataFrame depending on the condition row['Payment 2'] == 0 like this:

df_new = pd.DataFrame(columns=['ID', 'Payment 1', 'Payment 2', 'Payment 3', 'Payment 4', 'Payment 5', 'Activity'])
for index, row in df.iterrows():
  if row['Payment 2'] == 0:
    df_new = df_new.append(row)
  else:
    row_new = pd.DataFrame({'ID':[row['ID']], 'Payment 1':[row['Payment 1']], 'Payment 2':[0], 'Payment 3':[row['Payment 3']], 'Payment 4':[row['Payment 4']], 'Payment 5':[row['Payment 5']], 'Activity':[', '.join([x for x in ['Payment 1', 'Payment 3', 'Payment 4', 'Payment 5'] if row[x] != 0])]})
    df_new = df_new.append(row_new)
    row_new = pd.DataFrame({'ID':[row['ID']], 'Payment 1':[0], 'Payment 2':[row['Payment 2']], 'Payment 3':[0], 'Payment 4':[0], 'Payment 5':[0], 'Activity':'Payment 2'})
    df_new = df_new.append(row_new)

This will append two new rows (one with Payment 2 only and one with the rest) if there is a Payment 2 and append the original row from the source DataFrame if there is no Payment 2.

Related questions on splitting rows (based on one column only and leaving the other ones untouched):

Collectives™ on Stack Overflow

How to split rows in a dataframe based on column value?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related