2

I have a [long] pandas data frame with 2 columns. The first column is for a prescription number (keep in mind these are not unique as multiple rows can have the same prescription number). The second column is 1 item in that transaction number. I want to create a list of items for each transaction number (with duplicates removed) and put each of these lists into a larger, nested list with the length equaling the number of UNIQUE transaction numbers.

I have successfully achieved this feat, however, it takes a while to run and I would like to know a better (i.e., faster) way of doing it. My code is below:

# get the unique values for prescription
list_prescription = list(pd.value_counts(df['prescription']).index)

# make a list of product_name for each tx_plan_id_date (this will be time consuming)
time_start = datetime.datetime.now()
counter = 1
list_list_product_name = []
for prescription in list_prescription:
    # subset to just that tx_plan_id_date
    df_subset = df[df['prescription'] == prescription]
    # put product_name into a list
    list_product_name = list(df_subset['product_name'])
    # remove any duplicates
    list_product_name = list(dict.fromkeys(list_product_name))
    # append list_product_name to list_list_product_name
    list_list_product_name.append(list_product_name)
    # get current time
    time_current = datetime.datetime.now()
    # get minutes elapsed from time_start
    time_elapsed = (time_current - time_start).seconds/60
    # print a message to the console for status
    stdout.write('\r{0}/{1}; {2:0.4f}% complete; elapsed time: {3:0.2} min.'.format(counter, len(list_prescription), (counter/len(list_prescription))*100, time_elapsed))
    stdout.flush()
    # increase counter by 1
    counter += 1
2
  • 1
    I can't run code now, but I think new_df = df.groupby('transaction').agg(lambda x: list(x)).reset_index() would give you a new dataframe with one row for each transaction and a list of prescriptions in the second column Commented Sep 6, 2019 at 15:32
  • @Aryerez thank you! I ended up making a few small changes to your code with suggestions from @FrancescoLS using new_df = df.groupby('prescription').agg(lambda x: x.unique().tolist()).reset_index() and it worked great! Commented Sep 6, 2019 at 15:49

1 Answer 1

1

you can replace this part

# put product_name into a list
list_product_name = list(df_subset['product_name'])    
# remove any duplicates
list_product_name = list(dict.fromkeys(list_product_name))
# append list_product_name to list_list_product_name
list_list_product_name.append(list_product_name)

with

list_list_product_name.append(df_subset['product_name'].unique().tolist())

also, you might want to check groupby

Sign up to request clarification or add additional context in comments.

1 Comment

thank you. I used part of your code and the suggested groupby function new_df = df.groupby('prescription').agg(lambda x: x.unique().tolist()).reset_index()

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.