0

How can I use the same keys, but have different values in a dictionary?

I have a table of Jeopardy questions:

Show Number  Air Date    Round      Category         Value   Question    Answer
4680        12/31/2004  Jeopardy!   HISTORY          $200   Question 1  Copernicus
4680        12/31/2004  Jeopardy!   ESPN             $200   Question 2  Jim Thorpe
4680        12/31/2004  Jeopardy!   EVERYBODY TALKS  $200   Question 3  Arizona
4680        12/31/2004  Jeopardy!   THE COMPANY LINE $200   Question 4  McDonald's
4680        12/31/2004  Jeopardy!   EPITAPHS         $200   Question 5  John Adams

(Note: this is stored in a csv file. I just tried to show its layout above. It's available here, fyi)

And basically, I'm trying to get a list/dictionary/return that has a header matched with a question, something like a variable that holds:

['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Copernicus']
['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Jim Thorpe']
['Show Number':4680, 'Air Date': '12/31/2004', ...'Answer':'Arizona']

So later on, I can parse through that dictionary(?) and do things like get the unique values based on Category, Value, etc. Would it be ..a list of a dictionary??

I tried making it a dictionary - and it doesn't work. It only returns the last row's data. I understand why, because each time the row changes, it just starts back and updates the same keys with new info.

import csv

file_name = 'JEOPARDY_CSV.csv'

def get_data(csv_file):
    data = []
    with open(csv_file, 'r',  encoding="utf8") as read:
        reader = csv.reader(read)
        all_data = list(reader)
        data = all_data[1:]
        headers = all_data[0]
    return data, headers

def create_dict(data, headers):
    i = 0
    data_dict = {}
    for row in data:
        for col in row:
            data_dict[headers[i]] = col
            i+=1
        i = 0
    print(data_dict)

def main():
    file_data, headers = get_data(file_name)
    data_dictionary = create_dict(file_data[0:5], headers)

if __name__ == "__main__":
    main()

Again, the idea is to later on, have a function I can run to do things based on column header, like "return all questions where show number is 4680", or "for all categories, return the unique ones".

2
  • if you need more than 1 query type, you need more than 1 dictionary, with different keys. Commented Oct 27, 2017 at 7:22
  • Possible duplicate of Creating a dictionary from a CSV file Commented Oct 27, 2017 at 7:53

2 Answers 2

1

If some combination of columns uniquely identifies rows in this dataset (the primary key in relational database theory), you should include all of those columns in the dictionary's key. Searching on a key will be fast.

Alternatively, you can store non-unique data in a list of rows (list of dictionaries). Searching for a value will require looping through all rows in the list.

Sign up to request clarification or add additional context in comments.

3 Comments

Aha, I think I see what you mean. Would that make this question a duplicate of this question, perhaps?
In that question he says "I would like the first row of the CSV file to be used as the 'key' field for the dictionary". That appears to be a different data structure.
Hm then maybe I misunderstood your first point. Can you link me to an example, or maybe some mock code to show what you mean? It sounds promising!
1

Your current approach won't split the columns as you expected.
Another moment that csv.reader expects comma , as default delimiter. The columns in your file are delimited with arbitrary number of whitespaces. It's obvious that there should be another way to achieve the goal.

I would recommend pandas solution for such case:

import pandas as pd

file_name = 'JEOPARDY_CSV.csv'

def get_data(csv_file):
   df = pd.read_csv(file_name, sep='\s{2,}', engine='python', header=0)
   data = list(df.T.to_dict().values())
   return data

print(get_data(file_name))

The output is the needed list of dictionaries:

[{'Question': 'Question 1', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Copernicus', 'Category': 'HISTORY', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 2', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Jim Thorpe', 'Category': 'ESPN', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 3', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'Arizona', 'Category': 'EVERYBODY TALKS', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': "McDonald's", 'Value': 'Question 4', 'Air Date': '12/31/2004', 'Answer': None, 'Category': 'THE COMPANY LINE $200', 'Show Number': 4680, 'Round': 'Jeopardy!'}, {'Question': 'Question 5', 'Value': '$200', 'Air Date': '12/31/2004', 'Answer': 'John Adams', 'Category': 'EPITAPHS', 'Show Number': 4680, 'Round': 'Jeopardy!'}]

Going further, pandas allows you to group column values, get unique records, perform aggregations and many others ...

1 Comment

Ah sorry, the original data is in a csv, I edited to add that note. I've heard of pandas before, I'll look in to that thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.