3

I am working on an optimization problem and need to create indexing to build a mixed-integer mathematical model. I am using python dictionaries for the task. Below is a sample of my dataset. Full dataset is expected to have about 400K rows if that matters.

# sample input data
pd.DataFrame.from_dict({'origin': {0: 'perris', 1: 'perris', 2: 'perris', 3: 'perris', 4: 'perris'}, 
'dest': {0: 'alexandria', 1: 'alexandria', 2: 'alexandria', 3: 'alexandria', 4: 'alexandria'}, 
'product': {0: 'bike', 1: 'bike', 2: 'bike', 3: 'bike', 4: 'bike'}, 
'lead_time': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4}, 'build_time': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2}, 
'ship_date': {0: '02/25/2022', 1: '02/26/2022', 2: '02/27/2022', 3: '02/28/2022', 4: '03/01/2022'}, 
'ship_day': {0: 5, 1: 6, 2: 7, 3: 1, 4: 2}, 
'truck_in': {0: '03/01/2022', 1: '03/02/2022', 2: '03/03/2022', 3: '03/04/2022', 4: '03/07/2022'}, 
'product_in': {0: '03/03/2022', 1: '03/04/2022', 2: '03/05/2022', 3: '03/06/2022', 4: '03/09/2022'}})

The data frame looks like this -

enter image description here

I am looking to generate a dictionary from each row of this dataframe where the keys and values are tuples made of multiple column values. The output would look like this -

(origin, dest, product, ship_date): (origin, dest, product, truck_in)

# for example, first two rows will become a dictionary key-value pair like
{('perris', 'alexandria', 'bike', '2/25/2022'): ('perris', 'alexandria', 'bike', '3/1/2022'),
('perris', 'alexandria', 'bike', '2/26/2022'): ('perris', 'alexandria', 'bike', '3/2/2022')}

I am very new to python and couldn't figure out how to do this. Any help is appreciated. Thanks!

2
  • Why do you use pandas? If your input is a dictionary, it may be easier to convert dictionaries to dictionaries Commented Feb 26, 2022 at 22:18
  • Hey, input is not a dictionary but a dataframe as shown in picture/sample data. Commented Feb 26, 2022 at 22:20

2 Answers 2

2

You can loop through the DataFrame.

Assuming your DataFrame is called "df" this gives you the dict.

result_dict = {}
for idx, row in df.iterrows():
    result_dict[(row.origin, row.dest, row['product'], row.ship_date )] = (
                 row.origin, row.dest, row['product'], row.truck_in )

Since looping through 400k rows will take some time, have a look at tqdm (https://tqdm.github.io/) to get a progress bar with a time estimate that quickly tells you if the approach works for your dataset.

Also, note that 400K dictionary entries may take up a lot of memory so you may try to estimate if the dict fits your memory.

Another, memory waisting but faster way is to do it in Pandas

Create a new column with the value for the dictionary

df['value'] = df.apply(lambda x: (x.origin, x.dest, x['product'], x.truck_in), axis=1)

Then set the index and convert to dict

df.set_index(['origin','dest','product','ship_date'])['value'].to_dict()
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the answer. However, I am getting keys that look like this ('perris', 'alexandria', <bound method NDFrame._add_numeric_operations.<locals>.prod of origin perris dest alexandria product bike lead_time 4 build_time 2 ship_date 02/25/2022 ship_day 5 truck_in 03/01/2022 product_in 03/03/2022 Name: 0, dtype: object>, '02/25/2022'). Any idea why?
"product" is also a reserved key word. with a real dataframe, try row['product'] instead. I'm changing the answer for that.
Great! Using row['product'] fixed it.
+1 I'll give this a few more hours to see if anyone else wants to chime in. If not, will mark this as correct. Thanks a lot! :)
While you wait, make sure to check out that tqdm library I linked. When working with larger dataframes it's a great help to immediately know how long something will take (because if it will take way too long, you can cancel it and improve your program instead)
1

The approach below splits the initial dataframe into two dataframes that will be the source of the keys and values in the dictionary. These are then converted to arrays in order to get away from working with dataframes as soon as possible. The arrays are converted to tuples and zipped together to create the key:value pairs.

import pandas as pd
import numpy as np

df = pd.DataFrame.from_dict(
    {'origin': {0: 'perris', 1: 'perris', 2: 'perris', 3: 'perris', 4: 'perris'}, 
'dest': {0: 'alexandria', 1: 'alexandria', 2: 'alexandria', 3: 'alexandria', 4: 'alexandria'}, 
'product': {0: 'bike', 1: 'bike', 2: 'bike', 3: 'bike', 4: 'bike'}, 
'lead_time': {0: 4, 1: 4, 2: 4, 3: 4, 4: 4}, 'build_time': {0: 2, 1: 2, 2: 2, 3: 2, 4: 2}, 
'ship_date': {0: '02/25/2022', 1: '02/26/2022', 2: '02/27/2022', 3: '02/28/2022', 4: '03/01/2022'}, 
'ship_day': {0: 5, 1: 6, 2: 7, 3: 1, 4: 2}, 
'truck_in': {0: '03/01/2022', 1: '03/02/2022', 2: '03/03/2022', 3: '03/04/2022', 4: '03/07/2022'}, 
'product_in': {0: '03/03/2022', 1: '03/04/2022', 2: '03/05/2022', 3: '03/06/2022', 4: '03/09/2022'}}
     )
#display(df)

#desired output: (origin, dest, product, ship_date): (origin, dest, product, truck_in)

#slice df to key/value chunks
#list to array
ship = df[['origin','dest', 'product', 'ship_date']]
ship.set_index('origin', inplace = True)
keys_array=ship.to_records()
truck = df[['origin', 'dest', 'product', 'truck_in']]
truck.set_index('origin', inplace = True)
values_array = truck.to_records()

#array_of_tuples = map(tuple, an_array)
keys_map = map(tuple, keys_array)
values_map = map(tuple, values_array)

#tuple_of_tuples = tuple(array_of_tuples)
keys_tuple = tuple(keys_map)
values_tuple = tuple(values_map)

zipp = zip(keys_tuple, values_tuple)
dict2 = dict(zipp)
print(dict2)

2 Comments

Thanks! Is this intended to be faster than what @576i has provided? or is this just another way of achieving desired result?
Hi Shree, it was intended to be faster, but it is actually quite a bit slower. This is the runtime from %timeit: 2.8 ms ± 43.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) vs 627 µs ± 15.2

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.