Extract information from multiple CSV files, write to new CSV with third column

Question

I have a folder with four CSV files. In each CSV there are animals, and a number of occurrences for each animal. I'm trying to create a CSV that gathers up information from all the CSVs in the folder, removing duplicates, and adding a third column that lists the original file(s) the animal was found in. For example lion,4,'file2, file4'

I would really like my new CSV to have a third column that lists which files contain each animal, but I can't figure it out. I tried doing it with a second dictionary - refer to lines with locationCount. Look below for the current script I am using.

The files I have:

file1.csv:
cat,1
dog,2
bird,1
rat,3

file2.csv:
bear,1
lion,1
goat,1
pig,1

file3.csv:
rat,1
bear,1
mouse,1
cat,1

file4.csv:
elephant,1
tiger,2
dog,1
lion,3

Current script:

import glob
import os
import csv, pdb

listCSV = glob.glob('*.csv')
masterCount = {}
locationCount = {}
for i in listCSV: # iterate over each csv
    filename = os.path.split(i)[1] # filename for each csv
    with open(i, 'rb') as f:
        reader = csv.reader(f)
        location = []
        for row in reader:
            key = row[0]
            location.append(filename)
            masterCount[key] = masterCount.get(key, 0) + int(row[1]) 
            locationCount[key] = locationCount.get(key, location)
writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))
for key, value in masterCount.items():
    writer.writerow([key, value])

TessellatingHeckler · Accepted Answer · 2014-11-18 20:57:47Z

1

You're almost right - handle the Locations in the same way as you handle the counts.

I've renamed and shuffled things around, but it's basically the same code structure. masterCount adds a number to the previous numbers, masterLocations adds a filename to a list of previous filenames.

from glob import glob
import os, csv, pdb

masterCount = {}
masterLocations = {}

for i in glob('*.csv'):
    filename = os.path.split(i)[1]

    for animal, count in csv.reader(open(i)):
        masterCount[animal] = masterCount.get(animal, 0) + int(count) 
        masterLocations[animal] = masterLocations.get(animal, []) + [filename]

writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))

for animal in masterCount.keys():
    writer.writerow([animal, masterCount[animal], ', '.join(masterLocations[animal])])

edited Nov 18, 2014 at 20:57

answered Nov 18, 2014 at 7:35

TessellatingHeckler

29.3k4 gold badges55 silver badges96 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

mdandr Over a year ago

Thanks so much! I was working on that for hours without much success.

Collectives™ on Stack Overflow

Extract information from multiple CSV files, write to new CSV with third column

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related