I have a CSV file and I want to extract columns from it, but only from some of the rows. It looks like this:
gene_id, ENSDARG00000104632, gene_version, 2, gene_name, RERG
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
gene_id, ENSDARG00000104632, gene_version, 2, transcript_id, ENSDART00000166186
Essentially I want the 2nd and 6th column, but only from the rows which have "gene_name" in the 5th column. So I want to extract:
ENSDARG00000104632, RERG
(It goes on from there with many thousands of rows)
This is what I wrote:
import csv
with open('filename.csv', 'rb') as infh:
reader = csv.reader(infh)
for row in reader:
if row[4] == 'gene_name':
print row[1, 5]
However, it gives me this error:
File "./gene_name_grabber.sh", line 10, in
if row[4] == 'gene_name':IndexError: list index out of range
I understand that this error means I've asked it to look at an index number greater than the number of indexes in the rows...but there are clearly more than 4 indexes in each row. Help please?
Thanks!
ifcondition, so that we can see the line that gives this error?