1

I am currently am trying to create a nested dictionary from a csv file.

The CSV File represents how many people there are for each demographic region. In the nested dictionary each key is a region and the value is another dictionary. The inner dictionary uses the demographic as key and the number of people for its value.

Region,American,Asian,Black
midwest,2500,2300,2150
north,1200,2300,2300
south,1211,211,2100

Currently have:

def load_csv(filename):
data={}
    with open(filename) as csvfile:
        fh = csv.DictReader(csvfile)
        for row in fh:
            key = row.pop('Region')
            data[key] = row
        return data

Expected Output (must convert the numbers from strings to integers):

{'west':{'American': 2500, 'ASIAN': 2300, ...}, 'north':{'American': 1200, ..}...}

I'm getting stuck when running my code as it is giving me "KeyError: 'Region'"

2
  • This would be so much shorter with pandas .read_csv and .to_dict : just def load_csv(filename): return pd.read_csv(filename, index_col=0).to_dict('index') [This just takes the first column as index, no matter what it is; you could also use index_col='Region' and it would raise error if Region column was missing] Commented Feb 8, 2023 at 22:32
  • We are not able to use pandas Commented Feb 11, 2023 at 23:35

2 Answers 2

1

Use a comprehension to convert string values to integers:

import csv

def load_csv(filename):
    data = {}
    with open(filename) as csvfile:
        # Your file has 3 invisible characters at the beginning, skip them
        csvfile.seek(3)
        fh = csv.DictReader(csvfile)
        for row in fh:
            key = row.pop('Region')
            data[key] = {k: int(v) for k, v in row.items()}  # <- HERE
        return data

data = load_csv('data.csv')

Output:

>>> data
{'midwest': {'American': 2500, 'Asian': 2300, 'Black': 2150},
 'north': {'American': 1200, 'Asian': 2300, 'Black': 2300},
 'south': {'American': 1211, 'Asian': 211, 'Black': 2100}}

Bonus: The same operation with Pandas:

import pandas as pd

data = pd.read_csv('data.csv', index_col='Region').T.to_dict()
print(data)

# Output
{'midwest': {'American': 2500, 'Asian': 2300, 'Black': 2150},
 'north': {'American': 1200, 'Asian': 2300, 'Black': 2300},
 'south': {'American': 1211, 'Asian': 211, 'Black': 2100}}
Sign up to request clarification or add additional context in comments.

15 Comments

Hi, using first method still getting the KeyError do you know why this could be? It is spelled Region in the file
As I haven't this problem on your sample, I guess there is something wrong with your file. Can you share it?
Yes, how do I share it?
I have attached a picture of the csv file in VSCode
You can use wetransfer or google drive?
|
0

This solution requires no imports at all, but will only work if there are no escaped separators [i.e., none of the values contain a , or newline]:

def load_csv(filename, sep=','):
    with open(filename, 'r') as csvfile:
        csvlines = csvfile.read().strip().splitlines()
    csvRows = [[v.strip() for v in l.split(sep)] for l in csvlines]
    if not csvRows: return {}
    keys = csvRows[0][1:]
    return {r[0]: dict(zip(keys, r[1:])) for r in csvRows[1:] if r} 

At the very least, it works for the csv in your snippet, but if your csv contains , or newlines at any unexpected positions, this function will no longer be reliable - it would definitely be better to use a module built for reading and parsing csv.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.