9

I'm trying to order a .csv file with just over 300 entries and output it all back out ordered by the numerical values in one specific column under a dialect. Here's the code I've written so far but it just seems to output the data as it went in

import csv
import itertools
from itertools import groupby as gb

reader = csv.DictReader(open('Full_List.csv', 'r'))

groups = gb(reader, lambda d: d['red label'])
result = [max(g, key=lambda d: d['red label']) for k, g in groups]



writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

There's only 50 rows in the whole file that contain a value under the dialect "red label" and all the others are left blank. It's in the Z column on the .csv(but not that last one) so I'd assume the index of the column is 25(0 being the first). Any help would be greatly appreciated.

1
  • 1
    groupby isn't for sorting, it's for chunking an iterable. From the docs for itertools.groupby: "Generally, the iterable needs to already be sorted on the same key function." Commented Mar 21, 2013 at 23:15

3 Answers 3

11

How about using pandas?

import pandas as pd
df = pd.read_csv('Full_List.csv')
df = df.sort('red label')
df.to_csv('Full_List_sorted.csv', index=False)

You may need to adjust the options to read_csv and to_csv to match the format of your CSV file.

Sign up to request clarification or add additional context in comments.

10 Comments

I've tried using the pandas method you have told me about but whenever I run the script I get the error "No module pandas exists" even though I've installed it from my python directory using sudo apt-get install python-pandas
Which version of python and what operating system are you using?
I'm using python 3.2 on Ubuntu 12.10
Edit: I've figured out what the problem is in trying to run pandas. When I installed it it installed into my python2.7 folder but when I run my script it's running from the python3.2 folder which is in the same directory as the 2.7 version which is /usr/local/lib and I've no idea how to change my script to run from that directory
Finally got around the pandas error but the output is still the same as the above method that Steven gave me
|
9

groupby isn't for sorting, it's for chunking an iterable. For sorting use sorted.

import csv

reader = csv.DictReader(open('Full_List.csv', 'r'))
result = sorted(reader, key=lambda d: float(d['red label']))

writer = csv.DictWriter(open('output.csv', 'w'), reader.fieldnames)
writer.writeheader()
writer.writerows(result)

Note: I changed your lambda to cast your character data to float for correct numerical sorting.

7 Comments

I've tried that and gotten the following error: ValueError: could not convert string to float: I changed the casting from float to str. It compiled but it completely eliminated all values in the column it's sorting
From the ValueError it appears that d['red label'] does not always return numeric data. Do you have any empty fields? As regards to "it completely eliminated all values in the column", I think that is not the case. This code does not overwrite any values. It would be helpful to see your actual data.
Yes. All but 50 of the entries within that column are just blank fields.
If those blank fields can be sorted as if they have a value of 0.0 change float(d['red label']) to float(d['red label']) if d['red label']) else 0.0.
@AzKai: Post the first ten lines of your file. Something is not quite right here.
|
2

I found with testing that the following works on csv files that I have. Note that all rows of the column have valid entries.

from optparse import OptionParser
# Create options.statistic using -s
# Open and set up input file
ifile = open(options.filein, 'rb')
reader = cvs.DictReader(ifile)
# Create the sorted list
try:
  print 'Try the float version'
  sortedlist = sorted(reader, key = lambda d: float(d[options.statistic]), reverse=options.high)
except ValueError:
  print 'Need to use the text version'
  ifile.seek(0)
  ifile.next()
  sortedlist = sorted(reader, key=lambda d: d[options.statistic], reverse=options.high)
# Close the input file. This allows the input file to be the same as the output file
ifile.close()
# Open the output file
ofile = open(options.fileout, 'wb')
writer = csv.DictWriter(ofile, fieldnames=outfields, extrasactions='ignore', restval = '')
# Output the header
writer.writerow(dict((fn, fn) for fn in outfields))
# Output the sorted list
writer.writerows(sortedlist)
ofile.close()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.