Pandas DataFrame.groupby() to dictionary with multiple columns for value

Question

type(Table)
pandas.core.frame.DataFrame

Table
======= ======= =======
Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

For anyone familliar with pandas how would I build a multivalue dictionary with the .groupby() method?

I would like an output to resemble this format:

{
    0: [(23,1)]
    1: [(5,  2), (2, 3), (19, 5)]
    # etc...
    }

where Col1 values are represented as keys and the corresponding Col2 and Col3 are tuples packed into an array for each Col1 key.

My syntax works for pooling only one column into the .groupby():

Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
    0: [23], 
    1: [5, 2, 19], 
    2: [56, 22], 
    3: [2, 14], 
    4: [59], 
    5: [44, 1, 87]
}

However specifying multiple values for the indices results in returning column names for the value :

Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
    0: ['Column2', 'Column3'],
    1: ['Column2', 'Column3'],
    2: ['Column2', 'Column3'],
    3: ['Column2', 'Column3'],
    4: ['Column2', 'Column3'],
    5: ['Column2', 'Column3']
 }

How would I return a list of tuples in the value array?

akuiper · Accepted Answer · 2018-02-27 20:23:39Z

30

Customize the function you use in apply so it returns a list of lists for each group:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]], 
#  1: [[5, 2], [2, 3], [19, 5]], 
#  2: [[56, 1], [22, 2]], 
#  3: [[2, 4], [14, 5]], 
#  4: [[59, 1]], 
#  5: [[44, 1], [1, 2], [87, 3]]}

If you need a list of tuples explicitly, use list(map(tuple, ...)) to convert:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)], 
#  1: [(5, 2), (2, 3), (19, 5)], 
#  2: [(56, 1), (22, 2)], 
#  3: [(2, 4), (14, 5)], 
#  4: [(59, 1)], 
#  5: [(44, 1), (1, 2), (87, 3)]}

answered Feb 27, 2018 at 20:23

akuiper

216k33 gold badges362 silver badges379 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Micks Ketches Over a year ago

This is great, so the apply method is basically a map and reduce bundled into one?

akuiper Over a year ago

apply method is close to map, both simulate for loops. The reduce effect in this example is more due to groupby. Semantically, apply invokes the lambda function for each group.

jpp · Accepted Answer · 2018-02-27 20:35:23Z

8

One way is to create a new tup column and then create the dictionary.

df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()

# {0: [(23, 1)],
#  1: [(5, 2), (2, 3), (19, 5)],
#  2: [(56, 1), (22, 2)],
#  3: [(2, 4), (14, 5)],
#  4: [(59, 1)],
#  5: [(44, 1), (1, 2), (87, 3)]}

@Psidom's solution is more efficient, but if performance isn't an issue use what makes more sense to you:

df = pd.concat([df]*10000)

def jp(df):
    df['tup'] = list(zip(df['Column2'], df['Column3']))
    return df.groupby('Column1')['tup'].apply(list).to_dict()

def psi(df):
    return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

%timeit jp(df)   # 110ms
%timeit psi(df)  # 80ms

edited Feb 27, 2018 at 20:35

answered Feb 27, 2018 at 20:28

jpp

166k37 gold badges301 silver badges363 bronze badges

2 Comments

user1298416 Over a year ago

Would it possible to modify this to have dict instead of tuples: {0: {23: 1}, 1: {5: 2, 2: 3, 19: 5}, 2: {56: 1, 22: 2} } ?

jpp Over a year ago

@user1298416, take the output dct and use a comprehension: {k: dict(v) for k, v in dct.items()}. dict takes a list of tuples directly.

piRSquared · Accepted Answer · 2018-02-27 20:39:32Z

2

I'd rather use defaultdict

from collections import defaultdict

d = defaultdict(list)

for row in df.values.tolist():
    d[row[0]].append(tuple(row[1:]))

dict(d)

{0: [(23, 1)],
 1: [(5, 2), (2, 3), (19, 5)],
 2: [(56, 1), (22, 2)],
 3: [(2, 4), (14, 5)],
 4: [(59, 1)],
 5: [(44, 1), (1, 2), (87, 3)]}

answered Feb 27, 2018 at 20:39

piRSquared

296k68 gold badges509 silver badges654 bronze badges

Collectives™ on Stack Overflow

Pandas DataFrame.groupby() to dictionary with multiple columns for value

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related