I have hundreds of tab-separated text files and each one of them could have same or different set of headers. I want to do the following- 1. Read the file 2. Get header names 3. For one of the given (input) header, get the count of one particular element from the column (which is also given as input) Since I have many files, I can't know the column number for each column of my interest. Currently I'm reading the tsv files like this
file_name = os.path.join(tsv_name + ".txt")
input_file = open (file_name)
input_file_data = csv.reader(input_file, delimiter = "\t")
Then I'm getting the count with hard-coded column number by
countt = [rec[1] for rec in input_file_data]
print tsv_name + ".txt", countt.count(barcode)
where tsv_name is the filename (without extension, had to pull out the extensions due to various reasons) But my question is, I'd like to be able to input the column name, lets say 'codeID' while running the script as input, and if 'codeID' is found in any header, it should get the column number, use it in the countt statement. If its not found, skip and go to the next file. I'm stuck in the part where I give input column name and get its column number.
My data looks like this
barcodeID codeID conceptID studyID Event Time Addi_data
UTGN-02-01-0001 653 1256213 UTGN Adverse events 48h No
UTGN-02-01-0002 158 1256213 UTGN Adverse events 48h No
UTGN-02-01-0003 630 1256213 UTGN Adverse events 1d No
So when I give python program_name.py codeID 630, it should print filename.txt 1 (since its occurred 1 time in column number 2 with codeID as header)
PS- I don't want to use pandas or numpy since it needs additional installation on other devices that this script'd run on.