Sorting by Specific Column data using .csv in python

Question

I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in

import csv
import itertools
from itertools import groupby as gb

reader = csv.DictReader(open('Full_List.csv', 'r'))

groups = gb(reader, lambda d: d['red label'])
result = [max(g, key=lambda d: d['red label']) for k, g in groups]



writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first). Any help would be greatly appreciated.

groupby isn't for sorting, it's for chunking an iterable. From the docs for itertools.groupby: "Generally, the iterable needs to already be sorted on the same key function." — Steven Rumbalski
– Steven Rumbalski, Commented Mar 21, 2013 at 23:15

Adam Obeng · Accepted Answer · 2013-03-22 01:54:26Z

11

How about using pandas?

import pandas as pd
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)

You may need to adjust the options to read_csv and to_csv to match the format of your CSV file.

answered Mar 22, 2013 at 1:54

Adam Obeng

1,54210 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

AzKai Over a year ago

I've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas

Adam Obeng Over a year ago

Which version of python and what operating system are you using?

AzKai Over a year ago

I'm using python 3.2 on Ubuntu 12.10

AzKai Over a year ago

Edit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory

AzKai Over a year ago

Finally got around the pandas error but the output is still the same as the above method that Steven gave me

|

Steven Rumbalski · Accepted Answer · 2013-03-21 23:21:03Z

9

groupby isn't for sorting, it's for chunking an iterable. For sorting use sorted.

import csv

reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))

writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

Note: I changed your lambda to cast your character data to float for correct numerical sorting.

answered Mar 21, 2013 at 23:21

Steven Rumbalski

45.7k10 gold badges96 silver badges125 bronze badges

7 Comments

AzKai Over a year ago

I've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting

Steven Rumbalski Over a year ago

From the ValueError it appears that d['red label'] does not always return numeric data. Do you have any empty fields? As regards to "it completely eliminated all values in the column", I think that is not the case. This code does not overwrite any values. It would be helpful to see your actual data.

AzKai Over a year ago

Yes. All but 50 of the entries within that column are just blank fields.

Steven Rumbalski Over a year ago

If those blank fields can be sorted as if they have a value of 0.0 change float(d['red label']) to float(d['red label']) if d['red label']) else 0.0.

Steven Rumbalski Over a year ago

@AzKai: Post the first ten lines of your file. Something is not quite right here.

|

sabbahillel · Accepted Answer · 2014-01-16 18:00:49Z

I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.

from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
try:
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()

Collectives™ on Stack Overflow

Sorting by Specific Column data using .csv in python

3 Answers 3

10 Comments

7 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

10 Comments

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related