I have a piece of python code which is mentioned below but did not return what I want. and a file like this example:
AAAS,ENST00000552161,1.70232E-30
AAAS,ENST00000548258,1.09222E-84
AAAS,ENST00000549450,1.3171E-108
AAAS,ENST00000209873,22.3297
AAAS,ENST00000546562,0.170807
AAAS,ENST00000394384,5.53609
AAAS,ENST00000547238,0.829774
AACS,ENST00000316543,0.49901
AACS,ENST00000261686,2.41428
I the 1st column has a lot of repeated items. I want to choose only one of those based on the 3rd column. like the following rows:
AAAS,ENST00000209873,22.3297
AACS,ENST00000261686,2.41428
this is the code:
import csv
from collections import defaultdict
with open('data.csv', newline='') as f, open('out.csv', 'w', newline='') as out:
f_reader = csv.reader(f)
out_writer = csv.writer(out)
d = defaultdict(list)
for line in f_reader:
d[line[1]].append(line)
for _,v in d.items():
new_line = sorted(v, key=lambda i:float(i[2]), reverse=True)[0]
out_writer.writerow(new_line)
do you know what the problem is?
new_lineis a list of strings when you pass it towriterownew_lineline[1]which is the second element. You want to group them byline[0]