I have a large CSV file with many columns, something like this:
id, col1, col2, col3, col4, col5
1, a, b, 2, d, e
2, b, c, 4, e, f
3, c, d, 6, f, g
I want to be able to create a dictionary in which only certain columns are used. For example, the dictionary would have the id number, col2, and col3. Additionally, it should only store the rows that have the highest 10 numbers in col2. This is the code I have:
import csv
reader = csv.DictReader(open('SNPs.csv', newline=''), delimiter=',', quotechar='"')
But I do not know how to tell it to ignore certain columns, and I don't think that I can use max() to return multiple values.
max(2, 4) returns 4.
EDIT I tried using Daniel's Code, but for some reason the sort function isn't working correctly. (I also need to use reverse sort instead of sort). It only outputs four different keys, and additionally, they aren't actually sorted in descending numerical order. It also returns the headers as one of the values.
import csv
f = open('SNPs.csv', "rU")
reader = csv.reader(f)
output = [row for row in reader]
output.sort(key=lambda x: x[32], reverse=True)
print dict((row[10], (row[11], row[8], row[32])) for row in output[:10])