3

What I'm trying to do is read into a csv document and find all values in the SN column > 20 and make a new file with only the rows with SN > 20.

I know that I need to do:

  1. Read the original File
  2. Open a new file
  3. Iterate over rows of the original file

What I've been able to do is find the rows that have a value of SN > 20

import csv
import os

os.chdir("C:\Users\Robert\Documents\qwe")

with open("gdweights_feh_robert_cmr.csv",'rb') as f:
    reader = csv.reader(f, delimiter= ',')
    zerovar = 0
    for row in reader:
        if zerovar==0:
            zerovar = zerovar + 1
        else:
            sn = row [11]
            zerovar = zerovar + 1
            x = float(sn)
            if x > 20:
                print x

So my question is how do I take the rows with SN > 20 and turn it into a new file?

2
  • instead of 'print x' output to a file handle. Commented Apr 9, 2013 at 1:05
  • Skip the header with next(reader) before the loop, to remove the if-then statement from the body. Commented Apr 9, 2013 at 1:37

2 Answers 2

3

Save the data in a list, then write the list to a file.

import csv
import os

os.chdir(r"C:\Users\Robert\Documents\qwe")

output_ary = []
with open("gdweights_feh_robert_cmr.csv",'rb') as f:
    reader = csv.reader(f, delimiter= ',')
    zerovar = 0
    for row in reader:
        if zerovar==0:
            zerovar = zerovar + 1
        else:
            sn = row [11]
            zerovar = zerovar + 1
            x = float(sn)
            if x > 20:
                print x
                output_ary.append(row)

with open("output.csv",'w') as f2:
    for row in output_ary:
        for item in row:
            f2.write(item + ",")
Sign up to request clarification or add additional context in comments.

5 Comments

It sounds like it should be output_ary.append(row)
This uses quite a bit more memory than is necessary; just print the appropriate lines to the output file as they are read, instead of storing them all in a list.
When I try doing this, I get a TypeError: expected a character buffer object. What does this mean?
my mistake, I fixed it, please try again.
Sorry for late response...It works, however, I get a problem where it doesn't load everything. And only loads the first row.
0

In the code, the reading / looping through the rows is is quite complex. It could be cleaned up (and run faster in Python) with the following:

with open('gdweights_feh_robert_cmr.csv', 'rb') as f:
    output_ary = [row for row in f if float(row[11]) > 20]

Using list comprehension ([row for row if f]) is optimised in python, so it will preform more efficiently. AND... you avoid having to create the reader array, which will reduce the memory required, also very handy if the csv file is large.

You can then proceed to write out the outout_ary as suggested in the other answers.

Hope this helps!

2 Comments

The problem with this is that you cannot convert a string to a float.
That's interesting, it is perfectly acceptable to cast a string as a float, e.g. s = '3', y = float(s). You might be observe conflict if the string contains non-numerical characters e.g. float('3a') will result in a ValueError. If the string contains quote symbols, this will also occur

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.