Python csv count items in one column based on the name of another column

Question

I am new to programming in Python. I have a large CSV file (~5k items). There are 2 columns that I need to get data counted. The best way to explain what I need is show you a few rows of the csv:

Name column               OPTIONALDATA5 column 
Collaborative Desk  Broward
Collaborative Desk  Broward
Academic Desk           Broward
Academic Desk           Broward
Academic Desk           Broward
Academic Desk           Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Collaborative Desk  Broward
Broward             Broward
Alachua             Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua
Collaborative Desk  Alachua

In the above example I simply want the result as follows:

Broward:
collaborative Desk - 6
Academic Desk - 4
Broward - 1

Alachua:
collaborative Desk - 5
Alachua - 1

Maybe a total as well and then on to the next library in the spreadsheet.

I started writing code but I am wondering if there is a better way to do this.

Is your question about how to read the CSV file, or about how to do the two-level counting? — Blckknght
– Blckknght, Commented Apr 9, 2014 at 18:13

shaktimaan · Accepted Answer · 2014-04-09 18:32:18Z

Assuming the data is tab delimited, this is one way of getting what you want:

import csv
from collections import defaultdict, Counter

input_file = open('data')
csv_reader = csv.reader(input_file, delimiter='\t')

data = defaultdict(list)
for row in csv_reader:
    data[row[1]].append(row[0])

The data will now contain:

{'Alachua': ['Alachua', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk'], 
 'Broward': ['Collaborative Desk', 'Collaborative Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Academic Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Collaborative Desk', 'Broward']}

You can iterate over the value list for each key and get the total count, or use the Counter method in python as:

for k, v in data.items():
    print k
    print Counter(v)

This prints:

Alachua
Counter({'Collaborative Desk': 5, 'Alachua': 1})
Broward
Counter({'Collaborative Desk': 6, 'Academic Desk': 4, 'Broward': 1})

Savir · Accepted Answer · 2014-04-09 18:43:14Z

1

This works too (assuming your file is \t delimited):

import itertools
import operator
import csv 
import collections

results = collections.defaultdict(lambda: collections.defaultdict(int))

with open('sample.csv', 'r') as f_in: 
    f_in.seek(0)
    rdr = csv.reader(f_in, delimiter='\t')
    next(rdr)
    for row in rdr:
        results[row[1]][row[0]] += 1

for k, v in results.iteritems():
    print "%s" % k
    for k2, v2 in v.iteritems():
        print "    %s - %s" % (k2, v2)

Outputs:

Alachua
    Alachua - 1
    Collaborative Desk - 5
Broward
    Collaborative Desk - 6
    Academic Desk - 4
    Broward - 1

answered Apr 9, 2014 at 18:43

Savir

18.6k18 gold badges88 silver badges146 bronze badges

Collectives™ on Stack Overflow

Python csv count items in one column based on the name of another column

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related