0

I'm trying to write a script to scrub information from csvs. I have a pandas df created from a csv like below:

CUSTOMER ORDERS
  hashed_customer      firstname    lastname    email   order_id    status          timestamp
0      eater 1_uuid  1_firstname  1_lastname  1_email    12345    OPTED_IN     2020-05-14 20:45:15
1      eater 2_uuid  2_firstname  2_lastname  2_email    23456    OPTED_IN     2020-05-14 20:29:22
2      eater 3_uuid  3_firstname  3_lastname  3_email    34567    OPTED_IN     2020-05-14 19:31:55
3      eater 4_uuid  4_firstname  4_lastname  4_email    45678    OPTED_IN     2020-05-14 17:49:27
4      eater 5_uuid  5_firstname  5_lastname  5_email    56789    OPTED_IN     2020-05-14 16:22:33

I have another csv with the hashed_customers that I need to scrub from this file. So if the hashed_customer in this file is in CUSTOMER ORDERS, I need to remove the firstname, lastname, and email from the row while keeping the rest, to look something like this:

CUSTOMER ORDERS
      hashed_customer      firstname    lastname    email   order_id    status          timestamp
    0      eater 1_uuid         NULL        NULL     NULL    12345    OPTED_IN     2020-05-14 20:45:15
    1      eater 2_uuid  2_firstname  2_lastname  2_email    23456    OPTED_IN     2020-05-14 20:29:22
    2      eater 3_uuid  3_firstname  3_lastname  3_email    34567    OPTED_IN     2020-05-14 19:31:55
    3      eater 4_uuid         NULL        NULL     NULL    45678    OPTED_IN     2020-05-14 17:49:27
    4      eater 5_uuid  5_firstname  5_lastname  5_email    56789    OPTED_IN     2020-05-14 16:22:33

My current script looks like this:

print('FIND ORDERS FROM OPT-OUT CUSTOMERS')
cust_opt_out_order = []
for index, row in df_in.iterrows():
    if row.hashed_eater_uuid in cust_opt_out_id:
        cust_opt_out_order.append(row.order_id)

print('REMOVE OPT-OUT FROM OPT-IN FILE')
df_cust_out = df_in[~df_in['hashed_eater_uuid'].isin(cust_opt_out_id)]

But this is removing the entire row, and now I need to keep the row and only remove the name and email elements from the row. How can I drop elements from a row using pandas?

1 Answer 1

1

Let us do

df_cust_out = df_in.copy()
df_cust_out.loc[df_in['hashed_eater_uuid'].isin(cust_opt_out_id),['firstname','lastname', 'email']]=np.nan
Sign up to request clarification or add additional context in comments.

1 Comment

That's exactly it! Thank you so much!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.