0

I am trying to split my rows into multiple rows based on if a value exists in a column. I am able to filter based on the column value but would like to split the row into two based on the condition and return only specific parts of the row. The dataframe consists of different types of payments and the activity field represents the payment type that occurred. I would like for when there is a specific payment type to split the row into two. In the example below, when payment 2 exists I want it to be a separate transaction.

I have the following dataframe:

ID      Payment 1      Payment 2      Payment 3    Payment 4    Payment 5      Activity 
1          10             0              0            0              0         Payment 1  
2          0              10             15           0              0         Payment 2, Payment 3
3          10             20             30           0              0         Payment 1, Payment 2, Payment 3

But I would like for the dataframe to be:

ID      Payment 1      Payment 2      Payment 3    Payment 4    Payment 5       Activity 
 1          10             0              0            0              0         Payment 1  
 2          0              10             0            0              0         Payment 2
 2          0              0              15           0              0         Payment 3
 3          10             0              30           0              0         Payment 1, Payment 3
 3          0              20             0            0              0         Payment 2
1
  • prob much better way but you could do a str.split(expand=True) on "Payment 2," and ", Payment 2". That way you have a column of payment 2's when they exist Commented Nov 4, 2020 at 22:42

1 Answer 1

1

The problem here is that you want to change both the Payment columns and the Activity column at the same time. So actually, I only see the naive approach of iterating through the DataFrame and writing new rows to a new DataFrame depending on the condition row['Payment 2'] == 0 like this:

df_new = pd.DataFrame(columns=['ID', 'Payment 1', 'Payment 2', 'Payment 3', 'Payment 4', 'Payment 5', 'Activity'])
for index, row in df.iterrows():
  if row['Payment 2'] == 0:
    df_new = df_new.append(row)
  else:
    row_new = pd.DataFrame({'ID':[row['ID']], 'Payment 1':[row['Payment 1']], 'Payment 2':[0], 'Payment 3':[row['Payment 3']], 'Payment 4':[row['Payment 4']], 'Payment 5':[row['Payment 5']], 'Activity':[', '.join([x for x in ['Payment 1', 'Payment 3', 'Payment 4', 'Payment 5'] if row[x] != 0])]})
    df_new = df_new.append(row_new)
    row_new = pd.DataFrame({'ID':[row['ID']], 'Payment 1':[0], 'Payment 2':[row['Payment 2']], 'Payment 3':[0], 'Payment 4':[0], 'Payment 5':[0], 'Activity':'Payment 2'})
    df_new = df_new.append(row_new)

This will append two new rows (one with Payment 2 only and one with the rest) if there is a Payment 2 and append the original row from the source DataFrame if there is no Payment 2.

Related questions on splitting rows (based on one column only and leaving the other ones untouched):

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.