0

I have hundreds of tab-separated text files and each one of them could have same or different set of headers. I want to do the following- 1. Read the file 2. Get header names 3. For one of the given (input) header, get the count of one particular element from the column (which is also given as input) Since I have many files, I can't know the column number for each column of my interest. Currently I'm reading the tsv files like this

file_name = os.path.join(tsv_name + ".txt")
input_file = open (file_name)
input_file_data = csv.reader(input_file, delimiter = "\t")

Then I'm getting the count with hard-coded column number by

countt = [rec[1] for rec in input_file_data]
print tsv_name + ".txt", countt.count(barcode)

where tsv_name is the filename (without extension, had to pull out the extensions due to various reasons) But my question is, I'd like to be able to input the column name, lets say 'codeID' while running the script as input, and if 'codeID' is found in any header, it should get the column number, use it in the countt statement. If its not found, skip and go to the next file. I'm stuck in the part where I give input column name and get its column number.

My data looks like this

barcodeID   codeID  conceptID   studyID Event   Time    Addi_data
UTGN-02-01-0001 653 1256213 UTGN    Adverse events  48h No
UTGN-02-01-0002 158 1256213 UTGN    Adverse events  48h No
UTGN-02-01-0003 630 1256213 UTGN    Adverse events  1d  No

So when I give python program_name.py codeID 630, it should print filename.txt 1 (since its occurred 1 time in column number 2 with codeID as header)

PS- I don't want to use pandas or numpy since it needs additional installation on other devices that this script'd run on.

2 Answers 2

1

The DictReader reader of csv module allows yout to reference your CSV data using header names. It may lead you to a simpler solution to your problem :)

https://docs.python.org/2/library/csv.html#csv.DictReader

Sign up to request clarification or add additional context in comments.

Comments

1

Thank you Benoit Latinier! With your suggestion I came up with a piece of code that works for one file, I'm yet to integrate it to my big script. Here's a sample code

import csv
from collections import defaultdict
f = csv.DictReader(open(filename), delimiter = "\t")
fi = f.fieldnames
index_num = fi.index('codeID')

This index_num can be used in countt statement (hopefully!)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.