get CSV column name and number python

Question

I have hundreds of tab-separated text files and each one of them could have same or different set of headers. I want to do the following- 1. Read the file 2. Get header names 3. For one of the given (input) header, get the count of one particular element from the column (which is also given as input) Since I have many files, I can't know the column number for each column of my interest. Currently I'm reading the tsv files like this

file_name = os.path.join(tsv_name + ".txt")
input_file = open (file_name)
input_file_data = csv.reader(input_file, delimiter = "\t")

Then I'm getting the count with hard-coded column number by

countt = [rec[1] for rec in input_file_data]
print tsv_name + ".txt", countt.count(barcode)

where tsv_name is the filename (without extension, had to pull out the extensions due to various reasons) But my question is, I'd like to be able to input the column name, lets say 'codeID' while running the script as input, and if 'codeID' is found in any header, it should get the column number, use it in the countt statement. If its not found, skip and go to the next file. I'm stuck in the part where I give input column name and get its column number.

My data looks like this

barcodeID   codeID  conceptID   studyID Event   Time    Addi_data
UTGN-02-01-0001 653 1256213 UTGN    Adverse events  48h No
UTGN-02-01-0002 158 1256213 UTGN    Adverse events  48h No
UTGN-02-01-0003 630 1256213 UTGN    Adverse events  1d  No

So when I give python program_name.py codeID 630, it should print filename.txt 1 (since its occurred 1 time in column number 2 with codeID as header)

PS- I don't want to use pandas or numpy since it needs additional installation on other devices that this script'd run on.

Benoît Latinier · Accepted Answer · 2014-12-12 20:09:10Z

1

The DictReader reader of csv module allows yout to reference your CSV data using header names. It may lead you to a simpler solution to your problem :)

https://docs.python.org/2/library/csv.html#csv.DictReader

answered Dec 12, 2014 at 20:09

Benoît Latinier

2,1182 gold badges26 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

abn · Accepted Answer · 2014-12-12 20:32:45Z

1

Thank you Benoit Latinier! With your suggestion I came up with a piece of code that works for one file, I'm yet to integrate it to my big script. Here's a sample code

import csv
from collections import defaultdict
f = csv.DictReader(open(filename), delimiter = "\t")
fi = f.fieldnames
index_num = fi.index('codeID')

This index_num can be used in countt statement (hopefully!)

answered Dec 12, 2014 at 20:32

abn

1,3734 gold badges29 silver badges60 bronze badges

Collectives™ on Stack Overflow

get CSV column name and number python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related