Get counts of strings in each CSV column using Python

Question

I have a CSV file like this:

Header1,Header2,Header3,Header4
AA,12,ABCS,A1
BDDV,34,ABCS,BB2
ABCS,5666,gf,KK0

where a column can have only letters/words, or just numbers or both. I have multiple files like this and the columns are not necessarily the same in each. I'd like to get the counts of each element in a column that has only letters and no numbers in it.

My desired output is

Header1- [('AA', 1),('BDDV',1),('ABCS',1)] Header3- [('ABCS', 2),('gf', 1)]

Here, though both the columns have 'ABCS', I'd like to count them separately for each column.

I can get the count by hardcoding the column number like below:

import csv
import collections

count_number = collections.Counter()
with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    headers = next(r)
    for row in r:
        count_number[row[1]] += 1

print count_number.most_common()

but I'm confused on how to do it with respect to columns.

Just create a list of counters, one for each column you wish to count. — fnl
– fnl, Commented Jan 14, 2015 at 20:18

Reut Sharabani · Accepted Answer · 2015-01-14 20:30:18Z

1

This can work using a Counter for each header:

#!/usr/bin/env python
from collections import Counter, defaultdict
import csv

header_counter = defaultdict(Counter)

with open('filename.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    # read headers
    headers = next(r)
    for row in r:
        # count values for each row to add in header context
        row_val = sum([w.isdigit() for w in row])
        # zip each row with headers to know where to count
        for header, val in zip(headers, row):
            # count only non-digits
            if not any(map(str.isdigit, val)):
                header_counter[header].update({val: row_val})

for k, v in header_counter.iteritems():
    print k, v

Output:

Header3 Counter({'ABCS': 2, 'gf': 1})
Header1 Counter({'AA': 1, 'BDDV': 1, 'ABCS': 1})

edited Jan 14, 2015 at 20:30

answered Jan 14, 2015 at 20:18

Reut Sharabani

31.5k7 gold badges76 silver badges95 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Reut Sharabani Over a year ago

Added simpler code to swap all + map for any, and you're welcome :)

fnl · Accepted Answer · 2015-01-14 20:26:52Z

1

Partial solution only (you still need to filter columns with digits on the second iteration of your CSV reader).

import csv
import collections

with open('filename.csv') as input_file:
  r = csv.reader(input_file, delimiter=',')
  headers = next(r)
  count_number = [collections.Counter() for I in Len(headers)]

  for row in r:
    for i, val in enumerate(row):
      count_number[i][val] += 1

print [cr.most_common() for cr in count_number]

answered Jan 14, 2015 at 20:26

fnl

5,3614 gold badges30 silver badges32 bronze badges

Collectives™ on Stack Overflow

Get counts of strings in each CSV column using Python

2 Answers 2

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related