2

I want to iteratively append row values from multiple columns to a new column in a new DataFrame based on a group.

My goal is to have 1 row for each customer, with 1 column for the customer's ID and 1 column for their timeline that lists the date of each event followed by the event description, for all dates and events, in chronological order.

I have solved this with a series of dictionaries. I am searching for a clean, elegant, pandas-style way to accomplish this as this code will be run frequently with small changes to customers, events, etc.

Example:

import pandas as pd

df_have = pd.DataFrame({'Customer_ID':['customer_1','customer_1','customer_1','customer_2','customer_2'], 
                        'Event':['purchased cornflakes','purchased eggs', 'purchased waffles','sold eggs','purchased cows'],
                           'Date':['2011-06-16','2011-06-13','2011-06-09','2011-06-13','2011-06-18']})

df_have['Date'] = pd.to_datetime(df_have['Date'])

df_have.sort_values(['Customer_ID','Date'], inplace =True)
df_have

df I currently have

df_want = pd.DataFrame({'Customer_ID':['customer_1','customer_2'],
                       'Time_Line':[['2011-06-09,purchased waffles,2011-06-13,purchased eggs,2011-06-16,purchased cornflakes'],
                                   ['2011-06-13,sold eggs,2011-06-18,purchased cows']]})
df_want

df I'd like to have

1 Answer 1

2

Steps:

1) Set Customer_ID to be the index axis as it would remain static throughout the operation.

2) stack so that Date and Event fall below one another.

3) Peform groupby w.r.t the index (level=0) and convert the only column into list. Since we've stacked them in this sequence, they would appear alternatingly.


# set maximum width of columns to be displayed
pd.set_option('max_colwidth', 100)

df_have.set_index('Customer_ID').stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

enter image description here


To change the order in which sequence occurs inside the list:

df_have.set_index('Customer_ID').reindex_axis(['Event', 'Date'], axis=1).stack(
    ).groupby(level=0).apply(list).reset_index(name="Time_Line")

enter image description here

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.