24
type(Table)
pandas.core.frame.DataFrame

Table
======= ======= =======
Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

For anyone familliar with pandas how would I build a multivalue dictionary with the .groupby() method?

I would like an output to resemble this format:

{
    0: [(23,1)]
    1: [(5,  2), (2, 3), (19, 5)]
    # etc...
    }

where Col1 values are represented as keys and the corresponding Col2 and Col3 are tuples packed into an array for each Col1 key.

My syntax works for pooling only one column into the .groupby():

Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
    0: [23], 
    1: [5, 2, 19], 
    2: [56, 22], 
    3: [2, 14], 
    4: [59], 
    5: [44, 1, 87]
}

However specifying multiple values for the indices results in returning column names for the value :

Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
    0: ['Column2', 'Column3'],
    1: ['Column2', 'Column3'],
    2: ['Column2', 'Column3'],
    3: ['Column2', 'Column3'],
    4: ['Column2', 'Column3'],
    5: ['Column2', 'Column3']
 }

How would I return a list of tuples in the value array?

0

3 Answers 3

30

Customize the function you use in apply so it returns a list of lists for each group:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]], 
#  1: [[5, 2], [2, 3], [19, 5]], 
#  2: [[56, 1], [22, 2]], 
#  3: [[2, 4], [14, 5]], 
#  4: [[59, 1]], 
#  5: [[44, 1], [1, 2], [87, 3]]}

If you need a list of tuples explicitly, use list(map(tuple, ...)) to convert:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)], 
#  1: [(5, 2), (2, 3), (19, 5)], 
#  2: [(56, 1), (22, 2)], 
#  3: [(2, 4), (14, 5)], 
#  4: [(59, 1)], 
#  5: [(44, 1), (1, 2), (87, 3)]}
Sign up to request clarification or add additional context in comments.

2 Comments

This is great, so the apply method is basically a map and reduce bundled into one?
apply method is close to map, both simulate for loops. The reduce effect in this example is more due to groupby. Semantically, apply invokes the lambda function for each group.
8

One way is to create a new tup column and then create the dictionary.

df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()

# {0: [(23, 1)],
#  1: [(5, 2), (2, 3), (19, 5)],
#  2: [(56, 1), (22, 2)],
#  3: [(2, 4), (14, 5)],
#  4: [(59, 1)],
#  5: [(44, 1), (1, 2), (87, 3)]}

@Psidom's solution is more efficient, but if performance isn't an issue use what makes more sense to you:

df = pd.concat([df]*10000)

def jp(df):
    df['tup'] = list(zip(df['Column2'], df['Column3']))
    return df.groupby('Column1')['tup'].apply(list).to_dict()

def psi(df):
    return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

%timeit jp(df)   # 110ms
%timeit psi(df)  # 80ms

2 Comments

Would it possible to modify this to have dict instead of tuples: {0: {23: 1}, 1: {5: 2, 2: 3, 19: 5}, 2: {56: 1, 22: 2} } ?
@user1298416, take the output dct and use a comprehension: {k: dict(v) for k, v in dct.items()}. dict takes a list of tuples directly.
2

I'd rather use defaultdict

from collections import defaultdict

d = defaultdict(list)

for row in df.values.tolist():
    d[row[0]].append(tuple(row[1:]))

dict(d)

{0: [(23, 1)],
 1: [(5, 2), (2, 3), (19, 5)],
 2: [(56, 1), (22, 2)],
 3: [(2, 4), (14, 5)],
 4: [(59, 1)],
 5: [(44, 1), (1, 2), (87, 3)]}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.