2

I have a csv file with data like this:

Name Value Value2 Value3 Rating
ddf  34      45    46     ok
ddf  67      23    11     ok
ghd  23      11    78     bad
ghd  56      33    78     bad
.....

WHat I want to do is loop through my csv and add together the rows that have the same name, the string at the end of each row wil always remain the same for that name so there is no fear of it changing. How would I go about changing it to this in python?

Name Value Value2 Value3 Rating
ddf  101     68    57     ok
ghd  79      44    156    bad

EDIT:

In my code, the first thing I did was sort the list into order so the same names would be near each other, then I tried to use a for loop to add the numbered lines together by checking if the name value is the same on the first column. It's a very ugly way of doing it and I am at my wits end.

sortedList = csv.reader(open("keywordReport.csv"))

editedFile = open("output.csv",'w')
 wr = csv.writer(editedFile, delimiter = ',')

 name = ""

 sortedList = sorted(sortedList, key=operator.itemgetter(0), reverse=True)

 newKeyword = ["","","","","",""]

for row in sortedList:   
            if row[0] != name:
                wr.writerow(newKeyword)
                name = row[0]
            else:
                newKeyword[0] = row[0] #Name
                newKeyword[1] = str(float(newKeyword[1]) + float(row[1]))
                newKeyword[2] = str(float(newKeyword[2]) + float(row[2]))
                newKeyword[3] = str(float(newKeyword[3]) + float(row[3]))
3
  • 1
    If you haven't yet tried anything yet, you could start here: docs.python.org/3/library/csv.html Commented Oct 5, 2015 at 15:01
  • Import into sqlite, do a select query in it, you will be done in 2 minutes with almost zero effort Commented Oct 5, 2015 at 15:20
  • I do not have a database setup in sqllite, is it possible to load the csv file into memory and run an SQL command on it there? Commented Oct 5, 2015 at 15:40

3 Answers 3

1

The pandas way is very simple:

import pandas as pd

aframe = pd.read_csv('thefile.csv')

Out[19]:
Name    Value   Value2  Value3  Rating
0   ddf 34  45  46  ok
1   ddf 67  23  11  ok
2   ghd 23  11  78  bad
3   ghd 56  33  78  bad

r = aframe.groupby(['Name','Rating'],as_index=False).sum()

Out[40]:
Name    Rating  Value   Value2  Value3
0   ddf ok  101 68  57
1   ghd bad 79  44  156

If you need to do further analysis and statistics Pandas will take you a long way with little effort. For the use case here is like using a hammer to kill a fly, but I wanted to provide this alternative.

Sign up to request clarification or add additional context in comments.

4 Comments

This almost works for me but I am getting errors in which some cells are being 'fused' together. I.e. Names as well as some values. Thought it was a formatting problem but I've been playing with it for the past hour with no luck. Its happening to both strings and integers.
The sample up above is part of a 10,000 line file. Would the amount of data be an issue?
Sorry for the repetition but I think the problem is some number cells are being seen as string somehow?
Try the convert_objects function. This post has an example. The parameter convert_numericis False by default.
0

file.csv

Name,Value,Value2,Value3,Rating
ddf,34,45,46,ok
ddf,67,23,11,ok
ghd,23,11,78,bad
ghd,56,33,78,bad

code

import csv

def map_csv_rows(f):
    c = [x for x in csv.reader(f)]
    return [dict(zip(c[0], map(lambda p: int(p) if p.isdigit() else p, x))) for x in c[1:]]

my_csv = map_csv_rows(open('file.csv', 'rb'))

output = {}
for row in my_csv:
    output.setdefault(row.get('Name'), {'Name': row.get('Name'), 'Value': 0,'Value2': 0, 'Value3': 0, 'Rating': row.get('Rating')})
    for val in ['Value', 'Value2', 'Value3']:
        output[row.get('Name')][val] = output[row.get('Name')][val] + row.get(val)

with open('out.csv', 'wb') as f:
    fieldnames = ['Name', 'Value', 'Value2', 'Value3', 'Rating']
    writer = csv.DictWriter(f, fieldnames = fieldnames)
    writer.writeheader()
    for out in output.values():
        writer.writerow(out)

Comments

0

for comparison purposes, equivalent awk program

$ awk -v OFS="\t" '
     NR==1{$1=$1;print;next} 
          {k=$1;a[k]+=$2;b[k]+=$3;c[k]+=$4;d[k]=$5} 
       END{for(i in a) print i,a[i],b[i],c[i],d[i]}' input

will print

Name    Value   Value2  Value3  Rating
ddf     101     68      57      ok
ghd     79      44      156     bad

if it's a csv input and you want csv output, need to add -F, argument and change to OFS=,

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.