2

The data look like this:

id,outer,inner1,inner2,inner3
123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp

I would like the resulting dictionary to be

{'123': {'Smith,John': 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'},
 '456': {'Williams,Tim': 'xx', 'yy', 'zz', 'vv', 'ww', 'zz'},
        {'Miller,Ray': 'rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'}}

I tried adapting the accepted answer from Python Creating A Nested Dictionary From CSV File, but this method overwrites the dictionary at every row, so only the final row from each id ends up in the dictionary.

3 Answers 3

1

a collections.defaultdict using the first element from each row as the outer dict keys, then using the second for inner dict keys and adding the rest of the values from the row to a list as the value for the inner dict key:

import csv
from collections import defaultdict
with open("in.txt" ) as f:
    next(f) # skip header
    d = defaultdict(lambda: defaultdict(list))
    r = csv.reader(f)
    for row in r:
        d[row[0]][row[1]].extend(row[2:])

from pprint import pprint as pp

pp(dict(d))

Output:

{'123': {'Smith,John': ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']},
 '456': {'Miller,Ray': ['rrr', 'sss', 'ttt', 'qqq', 'www', 'ppp'],
         'Williams,Tim': ['xx', 'yy', 'zz', 'vv', 'ww', 'uu']}}

Since you are using python3 we can unpack in the loop using * to make the code a bit nicer:

with open("in.txt") as f:
    next(f)  # skip header
    d = defaultdict(lambda: defaultdict(list))
    r = csv.reader(f)
    for k1, k2, *vals in r:
        d[k1][k2].extend(vals))
Sign up to request clarification or add additional context in comments.

Comments

0

Yes, because in that example, this line the UID everytime:

new_data_dict[row["UID"]] = item

instead, you can use setdefault to default the entry to a list and append:

new_data_dict.setdefault(row["UID"], []).append(item)

Comments

0

dict.setdefault is a good way to fetch data structures, creating them as required.

import csv
import pprint

data = '''123,"Smith,John",a,b,c
123,"Smith,John",d,e,f
123,"Smith,John",g,h,i
456,"Williams,Tim",xx,yy,zz
456,"Williams,Tim",vv,ww,uu
456,"Miller,Ray",rrr,sss,ttt
456,"Miller,Ray",qqq,www,ppp
'''
data = data.splitlines()
data = csv.reader(data)

result = {}
for datum in data:
    outer = result.setdefault(datum[0], {})
    inner = outer.setdefault(datum[1], [])
    inner.extend(datum[2:])

pprint.pprint(result)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.