3

I have a CSV file like this:

Header1,Header2,Header3,Header4
AA,12,ABCS,A1
BDDV,34,ABCS,BB2
ABCS,5666,gf,KK0

where a column can have only letters/words, or just numbers or both. I have multiple files like this and the columns are not necessarily the same in each. I'd like to get the counts of each element in a column that has only letters and no numbers in it.

My desired output is

Header1- [('AA', 1),('BDDV',1),('ABCS',1)] Header3- [('ABCS', 2),('gf', 1)]

Here, though both the columns have 'ABCS', I'd like to count them separately for each column.

I can get the count by hardcoding the column number like below:

import csv
import collections

count_number = collections.Counter()
with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    headers = next(r)
    for row in r:
        count_number[row[1]] += 1

print count_number.most_common()

but I'm confused on how to do it with respect to columns.

1
  • 1
    Just create a list of counters, one for each column you wish to count. Commented Jan 14, 2015 at 20:18

2 Answers 2

1

This can work using a Counter for each header:

#!/usr/bin/env python
from collections import Counter, defaultdict
import csv

header_counter = defaultdict(Counter)

with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    # read headers
    headers = next(r)
    for row in r:
        # count values for each row to add in header context
        row_val = sum([w.isdigit() for w in row])
        # zip each row with headers to know where to count
        for header, val in zip(headers, row):
            # count only non-digits
            if not any(map(str.isdigit, val)):
                header_counter[header].update({val: row_val})

for k, v in header_counter.iteritems():
    print k, v

Output:

Header3 Counter({'ABCS': 2, 'gf': 1})
Header1 Counter({'AA': 1, 'BDDV': 1, 'ABCS': 1})
Sign up to request clarification or add additional context in comments.

1 Comment

Added simpler code to swap all + map for any, and you're welcome :)
1

Partial solution only (you still need to filter columns with digits on the second iteration of your CSV reader).

import csv
import collections

with open('filename.csv') as input_file:
  r = csv.reader(input_file, delimiter=',')
  headers = next(r)
  count_number = [collections.Counter() for I in Len(headers)]

  for row in r:
    for i, val in enumerate(row):
      count_number[i][val] += 1

print [cr.most_common() for cr in count_number]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.