Python 3: nested dictionary with multiple keys from csv

Question

The data look like this:

id,outer,inner1,inner2,inner3
123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp

I would like the resulting dictionary to be

{'123': {'Smith,John': 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'},
 '456': {'Williams,Tim': 'xx', 'yy', 'zz', 'vv', 'ww', 'zz'},
        {'Miller,Ray': 'rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'}}

I tried adapting the accepted answer from Python Creating A Nested Dictionary From CSV File, but this method overwrites the dictionary at every row, so only the final row from each id ends up in the dictionary.

Padraic Cunningham · Accepted Answer · 2015-08-05 22:05:49Z

a collections.defaultdict using the first element from each row as the outer dict keys, then using the second for inner dict keys and adding the rest of the values from the row to a list as the value for the inner dict key:

import csv
from collections import defaultdict
with open("in.txt" ) as f:
    next(f) # skip header
    d = defaultdict(lambda: defaultdict(list))
    r = csv.reader(f)
    for row in r:
        d[row[0]][row[1]].extend(row[2:])

from pprint import pprint as pp

pp(dict(d))

Output:

{'123': {'Smith,John': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']},
 '456': {'Miller,Ray': ['rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'],
         'Williams,Tim': ['xx', 'yy', 'zz', 'vv', 'ww', 'uu']}}

Since you are using python3 we can unpack in the loop using * to make the code a bit nicer:

with open("in.txt") as f:
    next(f)  # skip header
    d = defaultdict(lambda: defaultdict(list))
    r = csv.reader(f)
    for k1, k2, *vals in r:
        d[k1][k2].extend(vals))

R Samuel Klatchko · Accepted Answer · 2015-08-05 21:55:54Z

0

Yes, because in that example, this line the UID everytime:

new_data_dict[row["UID"]] = item

instead, you can use setdefault to default the entry to a list and append:

new_data_dict.setdefault(row["UID"], []).append(item)

answered Aug 5, 2015 at 21:55

R Samuel Klatchko

77k17 gold badges139 silver badges189 bronze badges

Comments

Robᵩ · Accepted Answer · 2015-08-05 21:56:15Z

0

dict.setdefault is a good way to fetch data structures, creating them as required.

import csv
import pprint

data = '''123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp
'''
data = data.splitlines()
data = csv.reader(data)

result = {}
for datum in data:
    outer = result.setdefault(datum[0], {})
    inner = outer.setdefault(datum[1], [])
    inner.extend(datum[2:])

pprint.pprint(result)

answered Aug 5, 2015 at 21:56

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Collectives™ on Stack Overflow

Python 3: nested dictionary with multiple keys from csv

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related