Summating CSV rows in Python

Question

I have a csv file with data like this:

Name Value Value2 Value3 Rating
ddf  34      45    46     ok
ddf  67      23    11     ok
ghd  23      11    78     bad
ghd  56      33    78     bad
.....

WHat I want to do is loop through my csv and add together the rows that have the same name, the string at the end of each row wil always remain the same for that name so there is no fear of it changing. How would I go about changing it to this in python?

Name Value Value2 Value3 Rating
ddf  101     68    57     ok
ghd  79      44    156    bad

EDIT:

In my code, the first thing I did was sort the list into order so the same names would be near each other, then I tried to use a for loop to add the numbered lines together by checking if the name value is the same on the first column. It's a very ugly way of doing it and I am at my wits end.

sortedList = csv.reader(open("keywordReport.csv"))

editedFile = open("output.csv",'w')
 wr = csv.writer(editedFile, delimiter = ',')

 name = ""

 sortedList = sorted(sortedList, key=operator.itemgetter(0), reverse=True)

 newKeyword = ["","","","","",""]

for row in sortedList:   
            if row[0] != name:
                wr.writerow(newKeyword)
                name = row[0]
            else:
                newKeyword[0] = row[0] #Name
                newKeyword[1] = str(float(newKeyword[1]) + float(row[1]))
                newKeyword[2] = str(float(newKeyword[2]) + float(row[2]))
                newKeyword[3] = str(float(newKeyword[3]) + float(row[3]))

If you haven't yet tried anything yet, you could start here: docs.python.org/3/library/csv.html — turbulencetoo
– turbulencetoo, Commented Oct 5, 2015 at 15:01
Import into sqlite, do a select query in it, you will be done in 2 minutes with almost zero effort — e4c5
– e4c5, Commented Oct 5, 2015 at 15:20
I do not have a database setup in sqllite, is it possible to load the csv file into memory and run an SQL command on it there? — GreenGodot
– GreenGodot, Commented Oct 5, 2015 at 15:40

rll · Accepted Answer · 2015-10-05 16:07:07Z

1

The pandas way is very simple:

import pandas as pd

aframe = pd.read_csv('thefile.csv')

Out[19]:
Name    Value   Value2  Value3  Rating
0   ddf 34  45  46  ok
1   ddf 67  23  11  ok
2   ghd 23  11  78  bad
3   ghd 56  33  78  bad

r = aframe.groupby(['Name','Rating'],as_index=False).sum()

Out[40]:
Name    Rating  Value   Value2  Value3
0   ddf ok  101 68  57
1   ghd bad 79  44  156

If you need to do further analysis and statistics Pandas will take you a long way with little effort. For the use case here is like using a hammer to kill a fly, but I wanted to provide this alternative.

answered Oct 5, 2015 at 16:07

rll

5,6253 gold badges33 silver badges49 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

GreenGodot Over a year ago

This almost works for me but I am getting errors in which some cells are being 'fused' together. I.e. Names as well as some values. Thought it was a formatting problem but I've been playing with it for the past hour with no luck. Its happening to both strings and integers.

GreenGodot Over a year ago

The sample up above is part of a 10,000 line file. Would the amount of data be an issue?

GreenGodot Over a year ago

Sorry for the repetition but I think the problem is some number cells are being seen as string somehow?

rll Over a year ago

Try the convert_objects function. This post has an example. The parameter convert_numericis False by default.

Cody Bouche · Accepted Answer · 2015-10-05 15:23:02Z

file.csv

Name,Value,Value2,Value3,Rating
ddf,34,45,46,ok
ddf,67,23,11,ok
ghd,23,11,78,bad
ghd,56,33,78,bad

code

import csv

def map_csv_rows(f):
    c = [x for x in csv.reader(f)]
    return [dict(zip(c[0], map(lambda p: int(p) if p.isdigit() else p, x))) for x in c[1:]]

my_csv = map_csv_rows(open('file.csv', 'rb'))

output = {}
for row in my_csv:
    output.setdefault(row.get('Name'), {'Name': row.get('Name'), 'Value': 0,'Value2': 0, 'Value3': 0, 'Rating': row.get('Rating')})
    for val in ['Value', 'Value2', 'Value3']:
        output[row.get('Name')][val] = output[row.get('Name')][val] + row.get(val)

with open('out.csv', 'wb') as f:
    fieldnames = ['Name', 'Value', 'Value2', 'Value3', 'Rating']
    writer = csv.DictWriter(f, fieldnames = fieldnames)
    writer.writeheader()
    for out in output.values():
        writer.writerow(out)

karakfa · Accepted Answer · 2015-10-05 15:34:54Z

0

for comparison purposes, equivalent awk program

$ awk -v OFS="\t" '
     NR==1{$1=$1;print;next} 
          {k=$1;a[k]+=$2;b[k]+=$3;c[k]+=$4;d[k]=$5} 
       END{for(i in a) print i,a[i],b[i],c[i],d[i]}' input

will print

Name    Value   Value2  Value3  Rating
ddf     101     68      57      ok
ghd     79      44      156     bad

if it's a csv input and you want csv output, need to add -F, argument and change to OFS=,

answered Oct 5, 2015 at 15:34

karakfa

67.8k8 gold badges45 silver badges59 bronze badges

Collectives™ on Stack Overflow

Summating CSV rows in Python

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

4 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related