5
import pandas as pd
import numpy as np

df = {'a': ['aa', 'aa', 'aa', 'aaa', 'aaa'], 
      'b':['bb', 'bb', 'bb', 'bbb', 'bbb'], 
      'c':[10,20,30,100,200]}

df = pd.DataFrame(data=df)

my_dict=df.groupby(['a', 'b'])['c'].apply(np.hstack).to_dict()

gives the following dictionary

>>> my_dict
{('aa', 'bb'): array([10, 20, 30]), ('aaa', 'bbb'): array([100, 200])}

Is there a faster/efficient way of doing this other than using apply?

2 Answers 2

5

Use dictionary comprehension:

my_dict= {k:np.hstack(v) for k, v in df.groupby(['a', 'b'])['c']}
print (my_dict)
{('aa', 'bb'): array([10, 20, 30]), ('aaa', 'bbb'): array([100, 200])}
Sign up to request clarification or add additional context in comments.

Comments

2

You could use groupby and itertuples:

my_dict = dict(df.groupby(['a','b']).agg(list).itertuples(name=None))

{('aa', 'bb'): [10, 20, 30], ('aaa', 'bbb'): [100, 200]}

Or more succinctly, as noted by Ch3steR:

df.groupby(['a','b']).agg(list).to_dict() 


{('aa', 'bb'): [10, 20, 30], ('aaa', 'bbb'): [100, 200]}

2 Comments

df.groupby(['a','b']).agg(list).to_dict() -> {('aa', 'bb'): [10, 20, 30], ('aaa', 'bbb'): [100, 200]}
@Ch3steR much cleaner thanks, added as an answer. I guess indexes get returned as tuples when using .to_dict methods.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.