Faster nested list generation in Python

Question

I have a [long] pandas data frame with 2 columns. The first column is for a prescription number (keep in mind these are not unique as multiple rows can have the same prescription number). The second column is 1 item in that transaction number. I want to create a list of items for each transaction number (with duplicates removed) and put each of these lists into a larger, nested list with the length equaling the number of UNIQUE transaction numbers.

I have successfully achieved this feat, however, it takes a while to run and I would like to know a better (i.e., faster) way of doing it. My code is below:

# get the unique values for prescription
list_prescription = list(pd.value_counts(df['prescription']).index)

# make a list of product_name for each tx_plan_id_date (this will be time consuming)
time_start = datetime.datetime.now()
counter = 1
list_list_product_name = []
for prescription in list_prescription:
    # subset to just that tx_plan_id_date
    df_subset = df[df['prescription'] == prescription]
    # put product_name into a list
    list_product_name = list(df_subset['product_name'])
    # remove any duplicates
    list_product_name = list(dict.fromkeys(list_product_name))
    # append list_product_name to list_list_product_name
    list_list_product_name.append(list_product_name)
    # get current time
    time_current = datetime.datetime.now()
    # get minutes elapsed from time_start
    time_elapsed = (time_current - time_start).seconds/60
    # print a message to the console for status
    stdout.write('\r{0}/{1}; {2:0.4f}% complete; elapsed time: {3:0.2} min.'.format(counter, len(list_prescription), (counter/len(list_prescription))*100, time_elapsed))
    stdout.flush()
    # increase counter by 1
    counter += 1

I can't run code now, but I think new_df = df.groupby('transaction').agg(lambda x: list(x)).reset_index() would give you a new dataframe with one row for each transaction and a list of prescriptions in the second column — Aryerez
– Aryerez, Commented Sep 6, 2019 at 15:32
@Aryerez thank you! I ended up making a few small changes to your code with suggestions from @FrancescoLS using new_df = df.groupby('prescription').agg(lambda x: x.unique().tolist()).reset_index() and it worked great! — Aaron England
– Aaron England, Commented Sep 6, 2019 at 15:49

FrancescoLS · Accepted Answer · 2019-09-06 15:36:50Z

1

you can replace this part

# put product_name into a list
list_product_name = list(df_subset['product_name'])    
# remove any duplicates
list_product_name = list(dict.fromkeys(list_product_name))
# append list_product_name to list_list_product_name
list_list_product_name.append(list_product_name)

with

list_list_product_name.append(df_subset['product_name'].unique().tolist())

also, you might want to check groupby

answered Sep 6, 2019 at 15:36

FrancescoLS

3762 gold badges7 silver badges18 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Aaron England Over a year ago

thank you. I used part of your code and the suggested groupby function new_df = df.groupby('prescription').agg(lambda x: x.unique().tolist()).reset_index()

Collectives™ on Stack Overflow

Faster nested list generation in Python

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related