Using headers to create values in a dictionary?

Question

How can I use the same keys, but have different values in a dictionary?

I have a table of Jeopardy questions:

Show Number  Air Date    Round      Category         Value   Question    Answer
4680        12/31/2004  Jeopardy!   HISTORY          $200   Question 1  Copernicus
4680        12/31/2004  Jeopardy!   ESPN             $200   Question 2  Jim Thorpe
4680        12/31/2004  Jeopardy!   EVERYBODY TALKS  $200   Question 3  Arizona
4680        12/31/2004  Jeopardy!   THE COMPANY LINE $200   Question 4  McDonald's
4680        12/31/2004  Jeopardy!   EPITAPHS         $200   Question 5  John Adams

(Note: this is stored in a csv file. I just tried to show its layout above. It's available here, fyi)

And basically, I'm trying to get a list/dictionary/return that has a header matched with a question, something like a variable that holds:

['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Copernicus']
['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Jim Thorpe']
['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Arizona']

So later on, I can parse through that dictionary(?) and do things like get the unique values based on Category, Value, etc. Would it be ..a list of a dictionary??

I tried making it a dictionary - and it doesn't work. It only returns the last row's data. I understand why, because each time the row changes, it just starts back and updates the same keys with new info.

import csv

file_name = 'JEOPARDY_CSV.csv'

def get_data(csv_file):
    data = []
    with open(csv_file, 'r',  encoding="utf8") as read:
        reader = csv.reader(read)
        all_data = list(reader)
        data = all_data[1:]
        headers = all_data[0]
    return data, headers

def create_dict(data, headers):
    i = 0
    data_dict = {}
    for row in data:
        for col in row:
            data_dict[headers[i]] = col
            i+=1
        i = 0
    print(data_dict)

def main():
    file_data, headers = get_data(file_name)
    data_dictionary = create_dict(file_data[0:5], headers)

if __name__ == "__main__":
    main()

Again, the idea is to later on, have a function I can run to do things based on column header, like "return all questions where show number is 4680", or "for all categories, return the unique ones".

if you need more than 1 query type, you need more than 1 dictionary, with different keys. — Jean-François Fabre
– Jean-François Fabre ♦, Commented Oct 27, 2017 at 7:22

René Pijl · Accepted Answer · 2017-10-27 07:25:20Z

1

If some combination of columns uniquely identifies rows in this dataset (the primary key in relational database theory), you should include all of those columns in the dictionary's key. Searching on a key will be fast.

Alternatively, you can store non-unique data in a list of rows (list of dictionaries). Searching for a value will require looping through all rows in the list.

answered Oct 27, 2017 at 7:25

René Pijl

4,7681 gold badge25 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

BruceWayne Over a year ago

Aha, I think I see what you mean. Would that make this question a duplicate of this question, perhaps?

René Pijl Over a year ago

In that question he says "I would like the first row of the CSV file to be used as the 'key' field for the dictionary". That appears to be a different data structure.

BruceWayne Over a year ago

Hm then maybe I misunderstood your first point. Can you link me to an example, or maybe some mock code to show what you mean? It sounds promising!

RomanPerekhrest · Accepted Answer · 2017-10-27 08:10:31Z

Your current approach won't split the columns as you expected.
Another moment that csv.reader expects comma , as default delimiter. The columns in your file are delimited with arbitrary number of whitespaces. It's obvious that there should be another way to achieve the goal.

I would recommend pandas solution for such case:

import pandas as pd

file_name = 'JEOPARDY_CSV.csv'

def get_data(csv_file):
   df = pd.read_csv(file_name, sep='\s{2,}', engine='python', header=0)
   data = list(df.T.to_dict().values())
   return data

print(get_data(file_name))

The output is the needed list of dictionaries:

[{'Question': 'Question 1', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Copernicus', 'Category': 'HISTORY', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 2', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Jim Thorpe', 'Category': 'ESPN', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 3', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Arizona', 'Category': 'EVERYBODY TALKS', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': "McDonald's", 'Value': 'Question 4', 'Air Date': '12/31/2004', 'Answer': None, 'Category': 'THE COMPANY LINE $200', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 5', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'John Adams', 'Category': 'EPITAPHS', 'Show Number': 4680, 'Round': 'Jeopardy!'}]

Going further, pandas allows you to group column values, get unique records, perform aggregations and many others ...

Ah sorry, the original data is in a csv, I edited to add that note. I've heard of pandas before, I'll look in to that thanks!

Collectives™ on Stack Overflow

Using headers to create values in a dictionary?

2 Answers 2

3 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related