4

I want to convert this dataframe to dictionary,

import pandas as pd
df = pd.DataFrame({'Date': {0: 44197,1: 44197,2: 44197,3: 44197,4: 44198,5: 44198,6: 44198,7: 44198,8: 44197,9: 44197,10: 44197,11: 44197,12: 44198,13: 44198,14: 44198,15: 44198},
 'Product': {0: 'B',1: 'A',2: 'C',3: 'D',4: 'B',5: 'A',6: 'C',7: 'D',8: 'B',9: 'A',10: 'C',11: 'D',12: 'B',13: 'A',14: 'C',15: 'D'},
 'Seller': {0: 'XXX',1: 'XXX',2: 'XXX',3: 'XXX',4: 'XXX',5: 'XXX',6: 'XXX',7: 'XXX',8: 'YYY',9: 'YYY',10: 'YYY',11: 'YYY',12: 'YYY',13: 'YYY',14: 'YYY',15: 'YYY'},
 'Price': {0: 10,1: 25,2: 36,3: 14,4: 60,5: 31,6: 2,7: 7,8: 5,9: 9,10: 10,11: 26,12: 36,13: 78,14: 95,15: 100}})

Here is the dataframe look like,

enter image description here

I want to create dict key based on Seller then for every date groupe products as key so A is key and Price is item

Order is Seller => Date (as the will be 1000 dates) => for each date you will have 4 products and thier corrsponding prices.

Does anyone know how to do this ? My true table is very large 100K lines, my goal is to make this as much fast as possible also.

My results I want to have will look like this:

{XXX:{"01/01/2021":{ A : Value1 , B : Value2 , C : Value3 , D : Value4 },"02/01/2021":{ .......}}, "YYY" : ....... }

7
  • What's the example desired output?? Commented Sep 8, 2021 at 12:15
  • {XXX:{"01/01/2021":{ A : Value1 , B : Value2 , C : Value3 , D : Value4 },"02/01/2021": .......}, "YYY" : ....... } @U12-Forward loooks like this Commented Sep 8, 2021 at 12:18
  • Does this answer your question? Convert a Pandas DataFrame to a dictionary Commented Sep 8, 2021 at 12:19
  • @Craicerjack read my post before send corrections. Commented Sep 8, 2021 at 12:19
  • 2
    Note to mods/close voters: this is not a duplicate of this. This question is a lot more complicated. Commented Sep 8, 2021 at 14:13

3 Answers 3

1
groupped_df = df.groupby(['Seller', 'Date']).apply(lambda x: x.set_index('Product').to_dict()).apply(lambda x: x['Price'])
desired_dict = {level: groupped_df.xs(level).to_dict() for level in groupped_df.index.levels[0]}

output:

Out[100]: 
{'XXX': {44197: {'B': 10, 'A': 25, 'C': 36, 'D': 14},
  44198: {'B': 60, 'A': 31, 'C': 2, 'D': 7}},
 'YYY': {44197: {'B': 5, 'A': 9, 'C': 10, 'D': 26},
  44198: {'B': 36, 'A': 78, 'C': 95, 'D': 100}}}

On the first line we are grouping and creating a series where seller and date are indices and values is a dictionary as desired (the most nested one). This is what happens on groupped_df = df.groupby(['Seller', 'Date']).apply(lambda x: x.set_index('Product').to_dict()), but since it creates some extra "data" inside the dictionary, I also added .apply(lambda x: x['Price']) to select only the price.

After the first line the groupped_df looks like this:

Out[101]: 
Seller  Date 
XXX     44197     {'B': 10, 'A': 25, 'C': 36, 'D': 14}
        44198       {'B': 60, 'A': 31, 'C': 2, 'D': 7}
YYY     44197       {'B': 5, 'A': 9, 'C': 10, 'D': 26}
        44198    {'B': 36, 'A': 78, 'C': 95, 'D': 100}

The second line shift the multi index into a nested dictionary, like in the this answer (notice no need to 'index' arg in to_dict)

Sign up to request clarification or add additional context in comments.

4 Comments

This is perfect, do you have an idea how to order the dates key from old to new ?
they are dictionary keys, they don't have order. Dictionary doesn't contain any order between it's keys
@Roim great answer! but fyi python dictionaries have been ordered since i think 3.7.
@RicksupportsMonica didn't know that. thank you. @heunevy my bad. I assume there's should be .sort_index inside the nested loop, but I'll check it later on and get back to you
1

Reshape the dataframe with pivot then create temporary column and assign the records, finally unstack on level=0 and use to_dict to create a dictionary

s = df.pivot(['Seller', 'Date'], 'Product', 'Price')
s.assign(r=s.to_dict('records'))['r'].unstack(0).to_dict()

{'XXX': {44197: {'A': 25, 'B': 10, 'C': 36, 'D': 14},
         44198: {'A': 31, 'B': 60, 'C': 2, 'D': 7}},
 'YYY': {44197: {'A': 9, 'B': 5, 'C': 10, 'D': 26},
         44198: {'A': 78, 'B': 36, 'C': 95, 'D': 100}}}

Comments

1

First, set your desired nested index to a multi index using set_index:

>>> df.set_index(["Seller", "Date", "Product"])
                      Price
Seller Date  Product
XXX    44197 B           10
             A           25
             C           36
             D           14
       44198 B           60
             A           31
             C            2
             D            7
YYY    44197 B            5
             A            9
             C           10
             D           26
       44198 B           36
             A           78
             C           95
             D          100

Then convert the multi-index to nested dictionaries using the to_dict method. Each column is a key in the dict, and multi-index are converted to keys, each of which is a tuple:

>>> df_transformed = df.set_index(["Seller", "Date", "Product"]).to_dict()
>>> df_transformed
{'Price': {('XXX', 44197, 'B'): 10, ('XXX', 44197, 'A'): 25, ('XXX', 44197, 'C'): 36, ('XXX', 44197, 'D'): 14, ('XXX', 44198, 'B'): 60, ('XXX', 44198, 'A'): 31, ('XXX', 44198, 'C'): 2, ('XXX', 44198, 'D'): 7, ('YYY', 44197, 'B'): 5, ('YYY', 44197, 'A'): 9, ('YYY', 44197, 'C'): 10, ('YYY', 44197, 'D'): 26, ('YYY', 44198, 'B'): 36, ('YYY', 44198, 'A'): 78, ('YYY', 44198, 'C'): 95, ('YYY', 44198, 'D'): 100}}

...capture the Price column by itself (but in this case it is of course the only column):

>>> price_dict = df_transformed.to_dict()["Price"]
>>> price_dict
{('XXX', 44197, 'B'): 10, ('XXX', 44197, 'A'): 25, ('XXX', 44197, 'C'): 36, ('XXX', 44197, 'D'): 14, ('XXX', 44198, 'B'): 60, ('XXX', 44198, 'A'): 31, ('XXX', 44198, 'C'): 2, ('XXX', 44198, 'D'): 7, ('YYY', 44197, 'B'): 5, ('YYY', 44197, 'A'): 9, ('YYY', 44197, 'C'): 10, ('YYY', 44197, 'D'): 26, ('YYY', 44198, 'B'): 36, ('YYY', 44198, 'A'): 78, ('YYY', 44198, 'C'): 95, ('YYY', 44198, 'D'): 100}

See this answer from Martijn Pieters for help with turning the tuple keys into nested dictionaries. I will just regurgitate that below for convenience, but it really is Pieters' solution:

def nest(d: dict) -> dict:
    result = {}
    for key, value in d.items():
        target = result
        for k in key[:-1]:  # traverse all keys but the last
            target = target.setdefault(k, {})
        target[key[-1]] = value
    return result

Finally:

>>> nest(price_dict)
{'XXX': {44197: {'B': 10, 'A': 25, 'C': 36, 'D': 14}, 44198: {'B': 60, 'A': 31, 'C': 2, 'D': 7}}, 'YYY': {44197: {'B': 5, 'A': 9, 'C': 10, 'D': 26}, 44198: {'B': 36, 'A': 78, 'C': 95, 'D': 100}}}

The OP mentions in a comment that the order of the dates matters. It is important to note that the order of the multi-index of the df_transformed will determine the order of the final nested dictionary. If you want to sort it in different ways before turning it into the dictionary, use .sort_index().

3 Comments

The groupby is correct thats what I'm trying to reach in dict ....
@heuveny yes although it's actually not really a groupby, it's a change to the index to match the nesting of the dict keys you want.
@heuveny I finally finished my answer. Sorry it took a while!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.