I have a folder with four CSV files. In each CSV there are animals, and a number of occurrences for each animal. I'm trying to create a CSV that gathers up information from all the CSVs in the folder, removing duplicates, and adding a third column that lists the original file(s) the animal was found in. For example lion,4,'file2, file4'
I would really like my new CSV to have a third column that lists which files contain each animal, but I can't figure it out. I tried doing it with a second dictionary - refer to lines with locationCount.
Look below for the current script I am using.
The files I have:
file1.csv:
cat,1
dog,2
bird,1
rat,3
file2.csv:
bear,1
lion,1
goat,1
pig,1
file3.csv:
rat,1
bear,1
mouse,1
cat,1
file4.csv:
elephant,1
tiger,2
dog,1
lion,3
Current script:
import glob
import os
import csv, pdb
listCSV = glob.glob('*.csv')
masterCount = {}
locationCount = {}
for i in listCSV: # iterate over each csv
filename = os.path.split(i)[1] # filename for each csv
with open(i, 'rb') as f:
reader = csv.reader(f)
location = []
for row in reader:
key = row[0]
location.append(filename)
masterCount[key] = masterCount.get(key, 0) + int(row[1])
locationCount[key] = locationCount.get(key, location)
writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))
for key, value in masterCount.items():
writer.writerow([key, value])