Parsing through CSV using DictReader in Python

Question

I have a large CSV file with many columns, something like this:

id, col1, col2, col3, col4, col5
1, a, b, 2, d, e
2, b, c, 4, e, f
3, c, d, 6, f, g

I want to be able to create a dictionary in which only certain columns are used. For example, the dictionary would have the id number, col2, and col3. Additionally, it should only store the rows that have the highest 10 numbers in col2. This is the code I have:

import csv 
reader = csv.DictReader(open('SNPs.csv', newline=''), delimiter=',', quotechar='"')

But I do not know how to tell it to ignore certain columns, and I don't think that I can use max() to return multiple values.

max(2, 4) returns 4.

EDIT I tried using Daniel's Code, but for some reason the sort function isn't working correctly. (I also need to use reverse sort instead of sort). It only outputs four different keys, and additionally, they aren't actually sorted in descending numerical order. It also returns the headers as one of the values.

import csv
f = open('SNPs.csv', "rU")
reader = csv.reader(f)
output = [row for row in reader]
output.sort(key=lambda x: x[32], reverse=True)
print dict((row[10], (row[11], row[8], row[32])) for row in output[:10])

Please give an example of what you want the output dictionary to look like. — unutbu
– unutbu, Commented Nov 4, 2012 at 21:34
@unutbu I want it to look something like this: 1: 2, e 2: 4, d — user1647556
– user1647556, Commented Nov 4, 2012 at 22:03

Daniel Roseman · Accepted Answer · 2012-11-04 22:35:49Z

2

col2 doesn't have any numbers. I'll assume you meant col3.

You can't tell which are the ten highest numbers in col3 until you've read them all. So since you're going to be doing that anyway, you might as well read everything, then extract the top ten afterwards. So you can do something like this:

output = []
for row in reader:
    output.append(dict(k, v) for k, v in row if k in ('id', 'col2', 'col3'))
output.sort(key=lambda x: x['col3'])
return output[:10]

Edit Now I see your desired output, you want something completely different to what I imagined. In fact DictReader is completely pointless here, so I'll rewrite with the normal Reader.

f = open('SNPs.csv')
reader = csv.Reader(f, delimiter=',', quotechar='"')
output = [row for row in reader]
output.sort(key=lambda x: x[3])
return dict((row[0], (row[3], row[4])) for row in output[:10])

edited Nov 4, 2012 at 22:35

answered Nov 4, 2012 at 21:39

Daniel Roseman

602k68 gold badges910 silver badges923 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user1647556 Over a year ago

Thanks Daniel. For some reason, it is telling me that there is a syntax error with output=[]. I have no idea why

Daniel Roseman Over a year ago

You have a syntax error in the line above, presumably since open doesn't take a newline argument (although that should give a TypeError). However I've realized that my code doesn't give you the output you want, I've added a new version.

user1647556 Over a year ago

Hi Daniel, I tried using the new code. I edited my original post to show the problem I'm still having.

Alex L · Accepted Answer · 2013-09-10 03:38:09Z

0

Maybe this works:

f = open("SNPs.csv", "rU")
reader = csv.reader(f)
data = [row for row in reader] #This only works if you have enough memory to do so
set_highest_ten = set(row[32] for row in sorted(
                                      data, key = lambda x: x[32], reverse = True)[0:10])
d = dict((row[10], (row[11], row[8], row[32])) for row in data
                                                       if row[32] in set_highest_ten)

I've tested with a small amount of data and it seems fine, but I'm not sure if this is exactly what you are looking for.

edited Sep 10, 2013 at 3:38

Alex L

9,0136 gold badges53 silver badges77 bronze badges

answered Nov 5, 2012 at 14:18

Willian Fuks

11.9k10 gold badges55 silver badges77 bronze badges

Collectives™ on Stack Overflow

Parsing through CSV using DictReader in Python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related