2

I have a folder with four CSV files. In each CSV there are animals, and a number of occurrences for each animal. I'm trying to create a CSV that gathers up information from all the CSVs in the folder, removing duplicates, and adding a third column that lists the original file(s) the animal was found in. For example lion,4,'file2, file4'

I would really like my new CSV to have a third column that lists which files contain each animal, but I can't figure it out. I tried doing it with a second dictionary - refer to lines with locationCount. Look below for the current script I am using.

The files I have:

file1.csv:
cat,1
dog,2
bird,1
rat,3

file2.csv:
bear,1
lion,1
goat,1
pig,1

file3.csv:
rat,1
bear,1
mouse,1
cat,1

file4.csv:
elephant,1
tiger,2
dog,1
lion,3

Current script:

import glob
import os
import csv, pdb

listCSV = glob.glob('*.csv')
masterCount = {}
locationCount = {}
for i in listCSV: # iterate over each csv
    filename = os.path.split(i)[1] # filename for each csv
    with open(i, 'rb') as f:
        reader = csv.reader(f)
        location = []
        for row in reader:
            key = row[0]
            location.append(filename)
            masterCount[key] = masterCount.get(key, 0) + int(row[1]) 
            locationCount[key] = locationCount.get(key, location)
writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))
for key, value in masterCount.items():
    writer.writerow([key, value])

1 Answer 1

1

You're almost right - handle the Locations in the same way as you handle the counts.

I've renamed and shuffled things around, but it's basically the same code structure. masterCount adds a number to the previous numbers, masterLocations adds a filename to a list of previous filenames.

from glob import glob
import os, csv, pdb

masterCount = {}
masterLocations = {}

for i in glob('*.csv'):
    filename = os.path.split(i)[1]

    for animal, count in csv.reader(open(i)):
        masterCount[animal] = masterCount.get(animal, 0) + int(count) 
        masterLocations[animal] = masterLocations.get(animal, []) + [filename]

writer = csv.writer(open('MasterAnimalCount.csv', 'wb'))

for animal in masterCount.keys():
    writer.writerow([animal, masterCount[animal], ', '.join(masterLocations[animal])])
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks so much! I was working on that for hours without much success.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.